536: Jensen Almost Took Nvidia Private for $15bn, AI Scaling Plateau?, Inside Elon's xAI, Semiconductors, Gwern, and Leonard Cohen
"a super-knowledgeable polymathic tutor"
Institutions will try to preserve the problem to which they are the solution.
—Kevin Kelly, expanding on Clay Shirky’s ideas
🥵🌡️🧖♀️🧖🏻 After four months with our backyard sauna, the time feels right to reflect on the experience.
The short version: I love it. 💚
We’ve established a routine of going between two and four times a week, typically in the evening, about an hour or two before bed.
While quantifying specific improvements to health or mood is challenging — no control group and so many variables — I've found no downsides to it.
Just the meditative aspect of going in a quiet box to relax, either alone or with my wife, has to be beneficial. But I’m sure the well-documented health benefits listed here will also apply over time.
Looking back, and as the cold season begins, I’m glad we chose a model with an oversized electric heater rather than an infrared model. Winters get pretty cold here, so it’ll be useful to have that extra oomph to get things up to the right temps more quickly.
I also feel like my heat tolerance is leveling up and I will want to increase temps over time to levels that an infrared sauna wouldn’t be able to achieve.
❓🤖🤔 Friend-of-the-show Rohit Krishnan asked this question on Twitter:
In what concrete ways has your life changed in the past 4 years because of AI? I'm talking actual big change, like pre-mobile phone and post-mobile phone change, not minor, discounting coding, and "play" like image gen, for the purposes of this discussion.
After thinking about it a bit, I’d say that it’s the ease with which I can find answers to questions that would normally require extensive, multi-step research using Google. In many cases, I may not have bothered.
Friction matters.
Reducing it enough goes from quantitative to qualitative.
Also, when I do research with more traditional tools, I don’t get the back-and-forth that I get with AI. It’s more natural to ask follow-up questions or to ask to rephrase things so they’re easier to understand or drill down into a specific aspect of a question.
All this compounds and adds up to a lot of extra knowledge gained over time, which can be life-changing.
What is it worth to you to have a super-knowledgeable polymathic tutor who never gets tired of answering your questions about anything and everything?
Sure they can make mistakes and sometimes make things up, but on the net, it’s pretty magical, right?
🛠️👨🏻🔧💭 Speaking of AI and knowledge, I was thinking about the massive importance of implicit and tacit knowledge to our civilization, and how very little of it is getting included in AI training data sets.
Is there a way to change that?
One potential approach would be for companies like TSMC to implement structured knowledge-capture programs, where employees dedicate an hour a week to documenting their on-the-job learnings that aren’t captured in training manuals, from the trivial to the profound, just try to get it all captured somewhere..? It wouldn’t be everything — a lot of knowledge is unconscious and/or non-verbal — but it would be a start.
Should industrial companies hire people — or use AIs? — to interview their employees about their jobs and try to capture some of their knowledge that isn’t captured anywhere else? Maybe follow them around and film what everyone does, almost like making a documentary about the company’s activities..?
These private data sets are likely to remain siloed inside individual companies. They would be most powerful aggregated together, but companies want to protect their competitive advantages and trade secrets... Is there a middle path?
These are just early thoughts about this. I wanted to share them in case they trigger better and smarter thoughts in you.
And if you are curious about expertise and tacit skills and all that, check out my conversation with Cedric Chin (🇸🇬🥋) and then check out CommonCog.
🏦 💰 Business & Investing 💳 💴
😎 Jensen reminding Masa he used to be the largest shareholder of Nvidia 😅 💸 + Take Private Offer at $15bn
Memorable moment at Nvidia’s AI Summit in Tokyo with this playful jab from Jensen.
SoftBank sold its entire Nvidia stake (4.9% of shares outstanding) in 2019 for approximately $4 billion.
That same stake would be worth more than $175 billion today 😬 no doubt making it among the costliest exit decisions in corporate history.
After the roast, Jensen revealed that Masa came to him years ago with an offer of financing the taking private of Nvidia by Jensen. This was soon after SoftBank acquired ARM, so probably in late 2015/early 2016.
Back then, Nvidia’s entire market cap hovered around $10-15bn. Even with a premium, it would’ve been the bargain of the millennia!
🤖🧠🔍 Is AI Scaling Running Out of Steam? What Does it Mean for Big Tech, Capex, Nvidia, etc? 🤔
Is AI scaling hitting a wall? Speed bump? plateau? This is a question I’ve been thinking about lately. I don’t have an answer, but I think it’s the right question to ask because some new data points are increasing the odds that something is going on there.
If forced to make a prediction, I’d bet on scaling continuing to work for a while longer. In machine learning, the Bitter Lesson usually wins. When we look back, tech progress looks smooth, but in real-time, there are always lots of temporary setbacks.
The recent headlines about scaling slowdowns could be just a moment in time — I remember reading a computer magazine when I was young that confidently predicted that CPUs probably wouldn’t get much faster than the Pentium 90MHz because it was already running so hot that you could fry an egg on it, and the transistors were already so small that they probably couldn’t shrink much more.
It’s always hard to see more than a few steps ahead — if we could see where to go clearly, we’d just race to it until we reached a point where we again have trouble seeing very far ahead. That’s the pattern for iterative progress.
But here are some of the data points that I’m spending some neuron cycles thinking about lately.
First, on OpenAI’s next big model:
Though OpenAI had only completed 20% of the training process for Orion, it was already on par with GPT-4 in terms of intelligence and abilities to fulfill tasks and answer questions, Altman said, according to a person who heard the comment.
While Orion’s performance ended up exceeding that of prior models, the increase in quality was far smaller compared with the jump between GPT-3 and GPT-4, the last two flagship models the company released, according to some OpenAI employees who have used or tested Orion.
Some researchers at the company believe Orion isn’t reliably better than its predecessor in handling certain tasks [...] Orion performs better at language tasks but may not outperform previous models at tasks such as coding.
From Illya Sutskever, one of the pioneers of the field:
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training - the phase of training an AI model that use s a vast amount of unlabeled data to understand language patterns and structures - have plateaued. [...]
“The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”
Sutskever declined to share more details on how his team is addressing the issue, other than saying SSI is working on an alternative approach to scaling up pre-training.
There is also news from Google on this:
Google hasn’t achieved the performance gains some of its leaders were hoping for after dedicating larger amounts of computing power and training data—such as text and images from the web, this person said. Past versions of Google’s flagship Gemini large language model improved at a faster rate when researchers used more data and computing power to train them.
Google’s experience is another indication that a core assumption about how to improve models, known as scaling laws, is being tested. Many researchers believed that models would improve at the same rate as long as they processed more data while using more specialized AI chips, but those two factors don’t seem to be enough. [...]
there may be limits to synthetic data generation. Google hasn’t seen significant improvements from using synthetic data for Gemini.
Challenges at Anthropic for its largest model:
Similar to its competitors, Anthropic has been facing challenges behind the scenes to develop 3.5 Opus, according to two people familiar with the matter. After training it, Anthropic found 3.5 Opus performed better on evaluations than the older version but not by as much as it should, given the size of the model and how costly it was to build and run, one of the people said.
Meanwhile, Sam Altman, Dario Amodei at Anthropic, Kevin Scott at Microsoft, and others make all kinds of bullish statements on the next models (that they’ve been previewing internally) and the next few years of progress. Both Musk and Zuck have built 100k+ GPU clusters and keep expanding, so they clearly believe that scale will keep paying off.
It’s certainly a case of mixed signals.
However, if the slowdown/plateau thesis is correct, it’s uncertain what it means. It could play out in several ways:
Keep reading with a 7-day free trial
Subscribe to Liberty’s Highlights to keep reading this post and get 7 days of free access to the full post archives.