The good: A legit AGI contender
Now, earlier than we get to the drama over the “new and improved” Grok (of which, there may be MUCH), let’s cowl the fundamentals of Grok 4, the “new” mannequin from Elon Musk’s AI firm xAI.
The launch was peak Elon chaos: Grok Four was launched late final night time in a dwell premiere, an hour late, with him in a leather-based jacket broadcasting from xAI headquarters alongside half his engineering staff.
During the livestream, Musk boldly claimed Grok Four is “smarter than almost all graduate students, in all disciplines.”
And the numbers truly again that up: It’s legitimately spectacular. On Humanity’s Last Exam (the toughest reasoning check for AI), it scored 26.9% with none instruments — that’s large when you think about most people solely get round 5%. With instruments? It jumps to 41%. But right here’s the place it will get wild: When they unleashed 32 Grok brokers working collectively, it hit 50%.

The meme in AI land is “we’re so cooked,” and we’re hesitant to make use of that phrase ourselves, buuuuut…. we’re beginning to really feel a bit like that well-known frog in that well-known pot who JUST seen he’s begun to sweat.
For context, these benchmarks present Grok Four completely obliterating the competitors:
- Doubled Claude Opus 4’s rating on ARC-AGI-2 (one other benchmark for assessing precise “reasoning.”
- Got an ideal 100% on AIME (math olympiad issues).
- Crushed GPT-4o on graduate-level science query (no surprises there; 4o is terribly dated at this level).
- Most importantly, Grok Four was capable of outperform Claude (and people, apparently) at operating a merchandising machine enterprise.

All of that isn’t a small enchancment — that’s a generational leap that has individuals saying we’re taking a look at AGI.
Of all of the charts shared, this one from the ARC-AGI was most likely probably the most impactful. Look at the place Grok Four is in comparison with the competitors:

At least as of this writing, it’s actually in a league of its personal.
Now, usually we reserve judgement till Artificial Analysis evaluations these new fashions, and fortunate for us, AA should have had entry early, as a result of Grok Four is already benchmarked on the official leaderboards. Sure sufficient, it took the highest spot:

Now, one thing to remember: OpenAI doesn’t EVER wish to be behind the highest spot on this record. So, we’re guessing GPT-5 (no matter it could be) is true across the nook, if not popping out proper on Grok’s heels to steal Elon’s thunder. It might be that OpenAI has sufficient happening that it doesn’t must crush the competitors simply but (if it even can), however we’re anticipating a cage match. They received’t go down swinging.
What precisely is Grok 4?
Key highlights from the launch:
- Two fashions launched: Grok Four and Grok 4 Heavy (multi-agent system).
- Reasoning-only fashions — xAI utterly eliminated non-reasoning variations.
- 256Okay context window, which is larger than o3’s 200Okay, smaller than Gemini’s 1M.
- “Eve” voice assistant with method higher latency than ChatGPT’s Advanced Voice Mode.
- New $300/month premium tier alongside the present $16/month choice.
- API launched instantly at $three per million enter tokens and $15 per million output tokens.
Sounds good, proper? Well, there’s only one drawback.
The unhealthy: The week earlier than launch
Before we get to the actually ugly half, there was already controversy brewing. In late June, Musk complained that Grok was too “woke” and relied too closely on…







