8 Comments
User's avatar
anon's avatar

re-asking from 2025.

why has the market for (inexpensive!) inference chips on PCs been a bust?

i expected something remarkable via better privacy and offloading workloads.

instead we get cerebras.

Liberty's avatar

Hey!

I think the closest thing we've seen are Apple's Macs and Nvidia's DGX (which isn't exactly cheap).

The main issue is probably memory. Apple's architecture has the CPU and GPU and XPU share RAM, so you can have a huge pool of RAM for a relatively cheap price which allows you to run decent-sized models. Most other PCs have separate RAM for GPUs and CPUs, and that mostly means too little RAM for a good model to run locally. And it's hard to change a whole architecture, not something you do easily or quickly.

That's my read, anyway.

That, plus until OpenClaw, it was a fairly niche thing to want to run something locally because the cloud models were so good and not that expensive (especially the open source ones that you'd be running locally anyway), so the incentive wasn't very strong. When you can get millions and millions of tokens for a few bucks, it's hard to justify a multi-thousand-dollar local computer for inference.

But I could be wrong or missing something ¯\_(ツ)_/¯

anon's avatar
2dEdited

this was not in regards to running agents.

am referring to the inference chips that only add ~$100 on new hp\dell pc.

my understanding is that most opensource models produce rather compact parameters for inference.

but i am certainly not understanding how this can be combined in hybrid fashion via cloud AI providers.

maybe the bottleneck is that this setup is still too complex. claude desktop doesnt even offer a linux version...not impressive (for an AI-coding winner).

Liberty's avatar

Do you have a source link to the type of chips you're referring to? Maybe I'm not thinking of the same thing

anon's avatar

the extra npu HW via amd and intel is at the cheapest range.

https://intel.github.io/intel-npu-acceleration-library/npu.html

Liberty's avatar

These things are useful for some things, but underpowered and memory bottlenecked for anything seriously useful. Even on Macs the GPUs end up being used more than the XPUs for most serious local AI inference, and on PCs the lack of shared memory makes it much worse. That's my understanding, anyway.

derek's avatar

Great post. Loved your section TSMC being the accidental central bank of compute capacities. And therefore ASML is an accidental central bank also?

For some of the new build data centers, is it power availability that is the bottleneck? Is that another “central bank“? Yikes.

Liberty's avatar

Thanks Derek!

Yes, ASML plays a similar role in many ways.

It seems like the bottleneck is moving from power to permitting/popular opposition these days, at least in the west. Power can be found, but the memes on DCs are getting worse 🤔