422: The Nvidia Edition, RAM Bottleneck, Marvel Box Office, OPEC, Manhattan Project, Unreal Engine 5.2, and Treatment for Depression
"This capability also allows for doing the opposite!"
An expert is someone who impresses you *more*, the more you know about their area of expertise.
There are almost no experts.
-David Boxenhorn
🔬👨🔬🧪🤖📄 In the past, I shared my hope that multi-modal LLMs could be used to detect fraud and good faith errors in scientific papers.
Both as they are submitted for review and by looking through the archives of every old published paper to detect red flags (falsified data, tempered images, gaps in logic that were missed by reviewers, bad math, underpowered methodology, etc). 🚩🚩🚩
It won’t be perfect, and it won’t catch everything, but it will likely reveal *a lot* that would otherwise be missed! 🕵️♂️
BUT (you saw this coming, didn’t you, Mr Jones?)
This capability also allows for doing the opposite!
As Professor Ethan Mollick points out:
A thing that makes AI so revolutionary for data analytics is its ability to "creatively" solve problems when facing a barrier.
I ask it to create a dataset with an r-squared of .02. First, it tries brute force, but then comes up many better approaches, solving issues as it goes.
This means, by the way, that AI can conduct scientific fraud on command.
Just tell GPT to “fix” a dataset by making & justifying changes to make a regression work better. It will give a “better” CSV. Not showing prompts, but it is trivial & another threat to academic integrity.
This is a good reminder that tools can be used for both good and evil.
Thankfully, there are more good than evil people in the world, but a lot of damage can still be done by the unscrupulous, so let’s be vigilant.
See for example the recent case of faked data in an Alzheimer’s study that likely contributed to a lot of wasted time and resources, and in this context, any delay means more human suffering.
📸✏️🔎 Every chance I get, I encourage people to start writing.
It makes you look at life differently.
It’s a bit like walking around with a camera. You look at the world more actively because you have purpose — could this be a good shot? How would the composition look if I walked over there? Maybe if I got close to the ground?
Knowing that you might write about something will transform passive situations into active ones, and even when you aren’t working on a specific idea, you’re still always looking, so your senses are sharper, like a hunter walking through a quiet forest. 🌳🌳🌳🌳🦌🌳🌳🌳
📺🤳😲 Here’s a twofer: An epic and innovative stop-motion video *and* a look at how they did it behind-the-scenes.
It’s great to see that it’s possible to make such a quality project with relatively simple tools and some good ol’ ingenuity!
See for yourself:
🏦 💰 Liberty Capital 💳 💴
🤖📲 The Memory Wall: The Limits of Running Large Language AI Models on Consumer Devices
Friend-of-the-show and supporter (💚 🥃) Dylan Patel has a good piece with Sophia Wisdom explaining the bottlenecks that we’re faced with when trying to run LLMs on consumer devices:
While the promise of on-device AI is undoubtedly alluring, there are some fundamental limitations that make local inference more challenging than most anticipated. The vast majority of client devices do not and will never have a dedicated GPU, so all these challenges must be solved on the SoC. One of the primary concerns is the significant memory footprint and computational power required for GPT-style models. The computational requirements, while high, is a problem that will be solved rapidly over the next 5 years with more specialized architectures, Moore’s Law scaling to 3nm/2nm, and the 3D stacking of chips.
This is a big advantage for Apple over a lot of the Android competition: They have very fast CPUs, GPUs, ML accelerators (which they call the “Neural Engine”), all in a package that is very power-efficient.
But RAM will be the first thing to get tight, and I wouldn’t be surprised if we saw a step change in the amount of RAM that consumer devices have in a year or two (the lead time on these things means that they can’t adapt quickly):
The problem with this approach for on-device LLM is that the parameters take too large of memory space to cache. A parameter stored in a 16-bit number format such as FP16 or BF16 is 2 bytes. Even the smallest “decent” generalized large language models is LLAMA at a minimum of 7 billion parameters. The larger versions are significantly higher quality. To simply run this model requires at minimum 14GB of memory at 16-bit percision. While there are a variety of techniques to reduce memory capacity, such as transfer learning, sparsification, and quantization, these don’t come for free and do impact model accuracy.
Furthermore, this 14GB ignores other applications, operating system, and other overhead related to activations/kv cache. This puts an immediate limit on the size of the model that a developer can use to deploy on-device AI, even if they can assume the client endpoint has the computational oomph required
Dylan and Sophia go through all kinds of possibilities to reduce memory requirements, look at which current devices may be able to hold these smaller models, and what we can expect in the next few years.
🔥 🔥 Nvidia Q1 Results 🔥🔥
Well, Mr. Market really liked this one!
The blockbuster guidance of $11 billion, which, if they meet it, would be an increase of 64% from the same period last year, caused the stock to jump by about 25% the following day.
The company’s market cap went from $755bn to close to $950bn+ during the next trading day! 🤯 For scale, Intel’s whole market cap is just $113bn at the time I’m writing this.
But that’s the stock, let’s move to more interesting things and see how the business has been doing and what is driving this spectacular growth:
Before we get to the word of the day, which is “AI”, the CFO did mention this:
We believe the channel inventory correction is behind us.
It looks like the mouse has finally made its way through the snake… 🐍
Below is Jensen on the “iPhone moment” for generative AI and how data centers will change because of it:
Remember, we were in full production of both Ampere and Hopper when the ChatGPT moment came [...] all of that came together in a really wonderful way. And it's the reason why I call it the iPhone moment. [...]
we've been at it for 15 years. And what happened is when generative AI came along, it triggered a killer app for this computing platform that's been in preparation for some time. And so now we see ourselves in 2 simultaneous transitions.
The world's $1 trillion data center is nearly populated entirely by CPUs today [...]
over the last 4 years, call it $1 trillion worth of infrastructure installed, and it's all completely based on CPUs and dumb NICs. It's basically unaccelerated. In the future, it's fairly clear now with this -- with generative AI becoming the primary workload of most of the world's data centers generating information, it is very clear now that -- and the fact that accelerated computing is so energy efficient, that the budget of a data center will shift very dramatically towards accelerated computing, and you're seeing that now. [...]
We're going through that moment right now as we speak, while the world's data center CapEx budget is limited.
But at the same time, we're seeing incredible orders to retool the world's data centers. And so I think you're starting -- you're seeing the beginning of, call it, a 10-year transition to basically recycle or reclaim the world's data centers and build it out as accelerated computing. You have a pretty dramatic shift in the spend of a data center from traditional computing and to accelerated computing with SmartNICs, smart switches, of course, GPUs and the workload is going to be predominantly generative AI.
Here’s the CFO on whether they may be supply-constrained (as they were in 2021):
We believe that the supply that we will have for the second half of the year will be substantially larger than H1. So we are expecting not only the demand that we just saw in this last quarter [...] we do plan a substantial increase in the second half compared to the first half.
Jensen framing competition and why what they’re offering is different:
Keep reading with a 7-day free trial
Subscribe to Liberty’s Highlights to keep reading this post and get 7 days of free access to the full post archives.