Liberty’s Highlights

Liberty’s Highlights

Share this post

Liberty’s Highlights
Liberty’s Highlights
576: Grok 4 vs ChatGPT, Amazon + Anthropic, OpenAI Open Model vs Microsoft, Palmer Luckey on VR & Killer Robots, Google Veo 3, Sauna, and Black Bird

576: Grok 4 vs ChatGPT, Amazon + Anthropic, OpenAI Open Model vs Microsoft, Palmer Luckey on VR & Killer Robots, Google Veo 3, Sauna, and Black Bird

"A lot of life is just a coordination problem"

Liberty's avatar
Liberty
Jul 11, 2025
∙ Paid
22

Share this post

Liberty’s Highlights
Liberty’s Highlights
576: Grok 4 vs ChatGPT, Amazon + Anthropic, OpenAI Open Model vs Microsoft, Palmer Luckey on VR & Killer Robots, Google Veo 3, Sauna, and Black Bird
2
Share

We know ourselves only as far as we’ve been tested.

—Wisława Szymborska

🖥️💻🖨️📺🔌🛡️⚡💥 You know that cliché scene where a bodyguard takes a bullet for someone? That’s basically what surge protectors do for your electronics.

You do have surge protectors, right? If not, let’s start there.

These devices shield your electronics from sudden voltage spikes caused by things like power outages, lightning, and faulty wiring.

They’re pretty inexpensive, so they act as cheap insurance for your TV, computers, home theater equipment, whatever.

Here’s something few people know:

These things wear out.

  • ➡️ You need to replace surge protectors every 3 to 5 years, on average ⬅️

How fast they degrade depends on the quality of your local power grid.

If you often have outages or power spikes, your surge protectors take the hit. That’s by design: There’s a component called the Metal Oxide Varistor (MOV) that acts as a sacrificial lamb and takes the hit so your devices don’t.

Each surge weakens it a little more.

The issue is that a worn-out surge protector will still deliver power. It’s not like a blown fuse, so you may not know that your stuff isn’t protected anymore.

Some of them have an LED light that claims to indicate if the protection is still active, but these aren’t 100% reliable, and you probably can’t see it anyway if the strip is behind furniture.

When buying a surge protector, the key metric to look for is the Joule rating. That’s the measure of how much energy it can absorb before it fails. Cheap ones may be rated for 400J. The good stuff starts at 2000J and goes up to 4000+.

Because these things are relatively inexpensive and you can get a 2000-4000J model for $30-50, I don’t think it’s even worth looking at the low-rated ones. Why risk it?

This is just one of those $30 decisions that can save you $3,000.

The one I recently bought is rated at 2700J.

I saw that Anker has some models on sale for Prime Day (July 11 is the last day). Other brands you may want to consider are Tripp Lite, APC, and CyberPower.

I don’t get a cut. This isn’t an ad.

I just want to protect your gear!

🔌🚫⚡🛒🔋 Speaking of electricity, a couple of Fridays ago, my house lost power for 10 hours.

Then, on that Sunday, we had another 16.5 hours outage… during a heat wave 🥵

It was the last straw for me.

The grid where I live has become increasingly unreliable over the past few years. We get multiple short power flashes and 1-2 power failures that last multiple hours each month.

I went into investigation & solution mode 🕵️‍♂️ First, understand the problem, then figure out what can be done:

  • I talked to some of my friends who work for the power utility and did some online research, and figured out that my street is on a tiny “island” grid cell 🏝️ We often don’t have power when the rest of the neighborhood does. Because it’s just one street, it probably doesn’t register much in the utility’s reliability stats and KPIs. My friends on the inside told me that the only way to make them notice is to file complaints, so they’ll flag it as something to look into and fix (hopefully).

  • So I printed 100 letters 🖨️ explaining why and how to file a complaint and distributed them to almost everyone on the street (I ran out). I included QR codes to reduce friction. Everyone I talked to was *really* angry about the situation, they just didn’t know they could do anything about it. I hope this will help channel this outrage in a productive direction.

  • A lot of life is just a coordination problem 🚦🔀

  • I also ordered two giant UPS batteries that can run both my desktop computer and my fiber modem & wifi router for over 15 hours. At least when the power fails again, I’ll be able to work.

After doing research, I got this model. It’s 1152Wh lithium-iron phosphate (LiFePO4), and it was even on sale:

Pro-tip: If you’re considering getting battery backup, this guy’s channel ☝️ is a great resource.


🏦 💰 Business & Investing 💳 💴

🐦🤖 Grok 4: Pushing the Frontier Forward, But For How Long? 🏁 🏃 🏃‍♂️🏃‍♀️

I’ll skip talking about what a terrible week X/xAI/Twitter has been having, because I’m sure you’ve seen it, and will instead focus on the latest big model to drop:

Grok 4, in both regular and Heavy flavors:

On Wednesday, xAI launched two models: Grok 4 and Grok 4 Heavy — the latter being the company’s “multi-agent version” that offers increased performance.

Musk claimed that Grok 4 Heavy spawns multiple agents to work on a problem simultaneously, and then they all compare their work “like a study group” to find the best answer.

Musk also said that Grok 4 is coming to Tesla vehicles soon.

The new improved voice mode could be particularly useful there, but I’ll be curious to see how many people start using it in their car *and then* keep using it elsewhere.

It might be similar to how people use Grok on Twitter because it’s convenient, but don’t seem to stick with it outside that context.

The performance of Grok 4 on the hardest benchmarks is impressive:

According to xAI, Grok 4 scored 25.4% on Humanity’s Last Exam without “tools,” outperforming Google’s Gemini 2.5 Pro, which scored 21.6%, and OpenAI’s o3 (high), which scored 21%.

xAI claims that Grok 4 Heavy, with “tools,” was able to achieve a score of 44.4%, outperforming Gemini 2.5 Pro with tools, which scored 26.9%.

The nonprofit Arc Prize says that Grok achieves a new state-of-the-art score on its ARC-AGI-2 test — another difficult benchmark that consists of puzzle-like problems where an AI has to identify visual patterns — scoring 16.2%. That’s nearly twice the score of the next best commercial AI model, Claude Opus 4.

xAI also launched a new paid sub plan: $300/month! 💵

That’s more expensive than OpenAI and Anthropic’s top-tier plans, so they’re really hoping that having the top-performing model will be a draw for the most demanding power-users for whom this may be a small price to pay compared to the value they’re getting.

I remain skeptical that they’ll see wide adoption, at least based on history and how sticky ChatGPT seems to be, and how hard it is for even Claude to break through to the mass market.

When it was released, Grok 3 was similarly impressive and ahead of the competition. That made a lot of people give it a try, but ultimately, most didn’t stick with it, and OpenAI and Anthropic released competitive models a few weeks later.

In fact, Grok 3 is widely reported to have gotten worse over time.

There’s a theory going around that, as it was optimized to reduce inference costs, some of its intelligence was degraded. Or maybe it’s all the system-prompt tuning they do to it for political reasons ¯\_(ツ)_/¯

To be clear, I think this is probably incorrect, because the LLM Arena Leaderboard rankings haven’t tanked…

Here’s what I think happened: In the early days of Grok 3, people were comparing it to worse models, so it seemed smarter. But a few months later, people compared it to OpenAI’s o3 and Anthropic’s Claude 4, and in relative terms, it didn’t seem as sharp as it once was.

In any case, all this massive investment and work — spending billions 💸 — for a couple of weeks in the spotlight, and then back to being mostly a Twitter feature that explains Tweets 😅

Will this time be different?

There’s a pretty good chance that Grok 4’s release may prompt OpenAI to release GPT-5 soon. I can’t know for sure, but it seems to be their pattern to hold models back in reserve as long as nobody has too much of a lead on them, because they’re inference-constrained anyway, and they can keep refining the model in the meantime.

They were going to release it this summer, but the exact timing may take into account competitive pressures. That’s my guess, anyway.

It’ll be interesting to see if they have something similar to Grok 4 Heavy, where a swarm of agents gets spawned to work together, almost like a group of coworkers around a whiteboard.

Maybe this will be rolled into the next version of Deep Research that will use GPT-5 instead of o3 🤔

🛒 🤖 Is Amazon Adding More Anthropic to Its Shopping Cart? 💰💰💰

Where’s the “Buy Now” button when you need it?

Keep reading with a 7-day free trial

Subscribe to Liberty’s Highlights to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Liberty RPF
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share