639: Nvidia's Q1, TSMC as the Accidental…

Liberty

May 27

"if we don't get a bubble, we need to throw a party for them"

Read →

8 Comments

anon

May 28

re-asking from 2025.

why has the market for (inexpensive!) inference chips on PCs been a bust?

i expected something remarkable via better privacy and offloading workloads.

instead we get cerebras.

Reply (1)

Liberty

May 28

Hey!

I think the closest thing we've seen are Apple's Macs and Nvidia's DGX (which isn't exactly cheap).

The main issue is probably memory. Apple's architecture has the CPU and GPU and XPU share RAM, so you can have a huge pool of RAM for a relatively cheap price which allows you to run decent-sized models. Most other PCs have separate RAM for GPUs and CPUs, and that mostly means too little RAM for a good model to run locally. And it's hard to change a whole architecture, not something you do easily or quickly.

That's my read, anyway.

That, plus until OpenClaw, it was a fairly niche thing to want to run something locally because the cloud models were so good and not that expensive (especially the open source ones that you'd be running locally anyway), so the incentive wasn't very strong. When you can get millions and millions of tokens for a few bucks, it's hard to justify a multi-thousand-dollar local computer for inference.

But I could be wrong or missing something ¯\_(ツ)_/¯

Reply (1)

anon

May 28Edited

this was not in regards to running agents.

am referring to the inference chips that only add ~$100 on new hp\dell pc.

my understanding is that most opensource models produce rather compact parameters for inference.

but i am certainly not understanding how this can be combined in hybrid fashion via cloud AI providers.

maybe the bottleneck is that this setup is still too complex. claude desktop doesnt even offer a linux version...not impressive (for an AI-coding winner).

Reply (1)

Liberty

May 28

Do you have a source link to the type of chips you're referring to? Maybe I'm not thinking of the same thing

Reply (1)

anon

May 28

the extra npu HW via amd and intel is at the cheapest range.

https://intel.github.io/intel-npu-acceleration-library/npu.html

Reply (1)

Liberty

May 28

These things are useful for some things, but underpowered and memory bottlenecked for anything seriously useful. Even on Macs the GPUs end up being used more than the XPUs for most serious local AI inference, and on PCs the lack of shared memory makes it much worse. That's my understanding, anyway.

derek

May 27

Great post. Loved your section TSMC being the accidental central bank of compute capacities. And therefore ASML is an accidental central bank also?

For some of the new build data centers, is it power availability that is the bottleneck? Is that another “central bank“? Yikes.

Reply (1)

Liberty

May 27

Thanks Derek!

Yes, ASML plays a similar role in many ways.

It seems like the bottleneck is moving from power to permitting/popular opposition these days, at least in the west. Power can be found, but the memes on DCs are getting worse 🤔