For the next two weeks we’re going to discuss compute (computing power). Why do you care about how much computing power AI needs? Because to understand AI, you need to understand compute’s contribution to the current state and its limitations in different settings and in the future.
Compute is arguably the part of the AI triad (algorithm, data, and compute) that enabled the most progress in AI over the past 10 years.
Today we’ll cover:
Why compute is the driver of LLM innovation
The types of computer chips used in AI
Wall Street and NVIDIA
Compute’s worrisome trajectory
OpenAI notes that from 2012-2018, the amount of compute used for the most sophisticated models increased by 300,000 times. The increase in computing power tracks very closely with the recent exponential growth in AI.This graphic from the Economist shows computing power used in training AI systems. Until about a decade ago the slope of Moore’s law (speed and capability of computers can be expected to double every two years) was very steady. Then it increased to doubling every 3.4 months instead of every 2 years and the slope increased drastically.
Compute as the driver of AI innovation
A report from the Center for Security and Emerging Technology notes that Richard Sutton, one of the founders of modern AI, feels that compute is the most important contribution to AI progress:
“Sutton argues that a great deal of AI progress has been enabled not by making systems more similar to humans, or by imparting more human knowledge to computers, but rather by giving machine learning systems greater processing power to learn on their own. In this view, the architecture of an algorithm and the data of human knowledge are simply less significant than the computer hardware enabling machine learning. If Sutton is right, then compute may well be the most important part of the triad.”
What has made this increase in processing power possible?
These new chips are different from traditional CPUs in a few ways:
Parallelization, or hundreds of processors training some piece of the machine learning algorithm at the same time.
The AI-specific chips optimize for speed and efficiency rather than numerical precision, and they often use AI-friendly programming languages
To understand how they’re different from the CPUs that we’ve been using all along, I like the analogy to a sports car and a semi truck. If you need to move a bunch of stuff (information), the CPU is the sports car and will get a few things there faster but will have to make a lot of trips; the GPU will get it there all at once.
There are three main kinds of AI chips:
Graphical processing units (GPUs)
Used to train and develop AI algorithms
Originally developed for the gaming industry
They can handle matrices well, which underpin a lot of the AI algorithms
Field Programmable Gate Arrays (FPGAs)
Used to apply trained algorithms to new information
Can be re-programmed so less risk of becoming obsolete but require more programming resources to do so
Application-specific Inference Circuits (ASICs)
Used for one specific purpose (MRIs for example)
The hardware is customized for the algorithm so it’s more efficient
Expensive to develop and manufacture since it’s custom-designed; more risk of obsolescence
This table from the Center for Security and Emerging Technology is a nice summary:
Wall Street jumps into compute
Now we know that:
Compute is important to AI now and to future AI progress
There are nuances to the kinds of chips you use for different purposes
Of course investors have noticed the importance of compute to the current and future state of AI, and one company in particular has gotten a lot of attention from Wall Street: NVIDIA, the maker of the GPU that OpenAI and others use for their AI models.
Nvidia GPUs reportedly make up 80 percent of the market.
Raymond James estimates it makes 1000% profit on every H100 GPU it sells
NVIDIA’s share price has gone from around $5 in 2015 to $462.
NVIDIA beat expectations on its earnings call in August with “revenue of $13.51 billion in the second quarter, more than doubling its performance in the same period a year ago and jumping 88% from Q1”.
The headlines below sum up the sentiment around it:
Programmers who use NVIDIA TPUs generally say that their software support is simply better and more mature than the competition.
Note that Google and Amazon make their own proprietary TPUs but do not sell them, Open AI is exploring making its own TPUs, and Microsoft recently announced that it will make TPUs for LLMs.
Also, NVIDIA’s French headquarters were recently raided due to concern about anticompetitive practices.
How long can we rely on compute to continue AI progress?
But other investors are more cautious, likely because they’ve seen graphics like those developed by CSET below, which show that “without any changes in the price of compute, the cost of a cutting edge model is expected to cross the U.S. GDP threshold in June of 2026”.
Obviously that’s not a sustainable pace, and as CSET states, “future progress in AI cannot rely on just continuing to scale up model sizes, and will instead have to come from doing more with more modest increases in compute.”
Therefore, there will either be:
Changes to models, data, computational methods, or use cases that aren’t as reliant on compute
A slowdown in AI innovation and breakthroughs
A huge amount of compute isn’t needed for all AI
The computing power needs that we’ve been discussing are for training and running large, complex models, usually LLMs. Foundation LLMs have higher needs for computational power and data and are usually run in large server farms using the expensive processors we discussed above. Many of the other AI applications don’t need such pricey or specialized equipment (and indeed, the area of AI with the most growth has been LLMs, which have traditionally been the most compute-dependent). And fine-tuning doesn’t use nearly as much compute as foundation models. Additionally, applications that run on small devices like cellphones don’t need these fancy GPUs.
Now you’re wondering what this has to do with healthcare. Next week we’ll look at the effect of compute in the clinical setting. In the meantime, you can impress people with your knowledge of TPUs. Enjoy!
Summary
Last week we discussed the AI triad, focusing on algorithms and people.
Upcoming weeks:
Compute in the clinical setting
Healthcare data as part of the AI triad
The future of the AI triad
Machine Learning for MDs Member Spotlights:
Patrick van Nieuwenhuizen, MD is an internal medicine physician building the EMR of the future at Canvas Medical. Before getting into Product he worked on AI algorithms for triage and diagnosis in healthcare. He used to train his own neural nets before you needed a trillion parameters to be cool. Now he just uses GPT-4 to write code for the Canvas SDK.
Jung Hoon Son, M.D. is currently Knowledge Architect at Cerevel Therapeutics. He completed his residency as a pathologist at New York Presbyterian (Columbia) and then trained as a National Library of Medicine Biomedical Informatics Fellowship. My informatics research was focus on leveraging NLP techniques for facilitating computational genomic diagnostics using phenotypic information stored in EHR using ontologies like Human Phenotype Ontology (HPO). After spending the last 7 years in the function of healthcare, medicine, biotech, machine learning, he would like to more openly share his insights via open-source contributions going forward. He is a firm believer that AI is a tool rather than a solution, and he has self-admitted "data synesthesia", which is the reason why he enjoys data visualizations (Altair/Vega enthusiast) in both R and python. Always looking to simplify and make AI/ML more approachable and affordable. Random facts about himself: 1) he has been winning fantasy football leagues for the past 7 years between 2 leagues, and 2) he was once one of the youngest person to hold a Microsoft Certified Systems Engineer (MCSE) license (16 years old).
🤷♂️ ?