420: The Battle over AI Training Data, Google’s PaLM 2, France + Nuclear, Microsoft's Fusion Deal, Global Defense Spending, and US Fertility
"Free-thinking isn’t free"
Free-thinking isn’t free. But it’s worth the price.
—Me in Edition #117 (is it pretentions to quote myself up here? I dunno, the whole rest of the thing is mostly me, so why not up here too once in a while if fits the tone I’m aiming for..?)
🧌🧙♂️🧟♂️⚔️🛡️🏰 Last week, I allowed my 9-year-old son to stay up late and observe part of my weekly Dungeons & Dragons game with a group of 5 friends. He had been curious about it for a while, and the more I described how the game worked, the more questions he had — which is a good sign!
If you answer 1 question and it generates 3 new questions, you’ve really tapped into a deep well of curiosity!
Timing was perfect, the gameplay he witnessed was particularly action-packed and funny. Luckily he didn’t get to witness us be indecisive about what to do for 15 minutes and instead saw us get ambushed by Kobolds in a strange Half-Life, Black Mesa-style high-tech bunker buried in the middle of a high-fantasy, medieval-style world. It’s fun to occasionally mix technology levels and embrace anachronism!
The following day he had so many questions! It was clear that he loved the freedom and creativity inherent to the game.
We settled on this way to describe it: It’s like virtual reality (VR), but without a headset, as the players just generate it all in their minds.
😩 ⏯️ I don’t understand how some consumer products are deemed “good enough” to be shipped to customers by their design and engineering teams.
Allow me a detour to illustrate:
In a first-person shooter game like ‘Call of Duty’, there are trillions of calculations happening every second to render complex 3D scenes, simulate physics, compute enemy AI, etc — both the GPU and CPU are performing an astounding amount of work. Despite this, interactions through the mouse or keyboard feel *instantaneous*.
However, when interacting with various consumer electronics, pressing buttons like 'play,' 'pause,' 'menu,' or 'change channel' results in a noticeable delay. WHY CAN’T THESE SIMPLE TASKS BE MADE INSTANT?
Don’t get me wrong, it’s not that I don’t understand that there are technical challenges. I just want them to be worked on and progress to be made. I want someone to give a damn about UI responsiveness!
🧠 The other simulation hypothesis (via Rick Rubin’s book ‘The Creative Act’):
The outside universe we perceive doesn't exist as such. Through a series of electrical and chemical reactions, we generate a reality internally. We create forests and oceans, warmth and cold. We read words, hear voices, and form interpretations. Then, in instant, produce a response. All of this in a world of our own creation.
At our core, we are about 3 pounds of wet meat trapped in a dark skull. Everything we perceive is generated in there. Ain’t sentient life something special?
🏦 💰 Liberty Capital 💳 💴
⚖️👩🏻⚖️ What is “fair use” when it comes to AI training data? Which way will regulation go? 🤔
This is the central question that will bifurcate AI development in one direction or the other, and I’m really curious to see what the courts and legislators will decide.
It could go in any direction. That stuff is unpredictable and can largely rest on which special interests have the ear of politicians, where is it perceived that votes can be won, how powerful and influential the industry is, how well understood or misunderstood the technical aspects are, etc. (see also: Regulators who didn’t know Facebook made money by selling ads)
I think you know where I (mostly) stand on this, I’ve written about it many times. I can still change my mind and will update as more facts come in, but generally, I think that it’s dangerous to try to put too many restrictions on “learning” from something, whether to a human or AI.
If you want to regulate things, the output is a better place to do it. ie. A generative model can create Mickey Mouse pictures, but if you decide to make some and try to sell them, then you can be stopped by the IP owner.
The big open question is whether publicly available online data should be accessible to everyone, including machines, or if only humans should be permitted to 'read' and 'learn' from it. Should artificial intelligence be limited in accessing otherwise public data, particularly when the learning process generates a distinct set of data, akin to how one's brain doesn't store every book, song, or film experienced, yet still derives patterns and ideas from them?
Google has certainly been able to scrape the entire public web under fair use to train its search engine. I think it would have been a bad thing if they had to ask permission and pay each website to do so because they were creating a commercial product with "their data".
What if in 1998 the courts or legislators had said that search engines needed permission from each website in the world to index them? How different would the world be today?
Keep reading with a 7-day free trial
Subscribe to Liberty’s Highlights to keep reading this post and get 7 days of free access to the full post archives.