If insanity is doing the same thing repeatedly while expecting different results, then watching xAI announce another "breakthrough" Grok model qualifies as collective madness.
Here we go again. xAI is gearing up to launch Grok 4 through its developer console, complete with the usual fanfare about "flagship performance" and "code-aware support." The company's source code reveals two models: Grok 4 for natural language and Grok 4 Code for developers. Meanwhile, they're testing AI-generated Community Notes on X, because apparently the platform needed more ways to spread confident misinformation.
But before we get swept up in another cycle of breathless coverage, let's examine what this company has actually delivered versus what it's promised.
The Art of Strategic Benchmark Theater
Grok 3's launch in February showcased a masterclass in misleading marketing. xAI published a graph showing Grok 3's performance on AIME 2025 that conveniently omitted OpenAI's o3-mini-high score at "cons@64"—a method that gives models 64 attempts and takes the most frequent answer. When you normalize the comparison, Grok 3's "@1" scores fall below o3-mini-high's performance.
As Esteve Almirall, professor at Esade and AI expert, noted: "because that way Grok 3 looks better in the picture". With standardized criteria, the ranking tells a very different story than xAI's carefully curated charts suggest.
This isn't accidental. It's systematic misrepresentation designed to generate headlines while obscuring actual performance gaps. The fact that many experts doubt Grok 3's actual performance and suspect it has been specifically trained on the benchmarks should raise red flags about any Grok 4 claims.
Let's trace this company's actual track record. Grok 1, released in November 2023, arrived with weaker reasoning capabilities compared to ChatGPT-4, struggled with complex logic and long-form responses, and had limited image processing capabilities. The model that was supposed to revolutionize AI conversation barely matched existing standards.
Grok 2.0's August 2024 launch brought its own controversies. The lack of content moderation guardrails presented significant moral and legal issues, with legal experts arguing that Grok represented one of the most reckless implementations of AI. Users could generate sophisticated scam emails and patently false images of politicians and public figures engaged in offensive conduct.
Even Grok 3, despite its impressive marketing campaign, shows fundamental flaws. Andrej Karpathy found it hallucinating citations and inventing fake URLs, while it failed his Unicode emoji mystery challenge that DeepSeek's R1 solved. For a model claiming to be the "world's smartest AI," these are embarrassing failures.
The economic reality behind Grok's development reveals troubling priorities. Grok 3 is more expensive compared to average with a price of $6.00 per 1M tokens, with input at $3.00 and output at $15.00 per 1M tokens. You're paying premium prices for a model that is slower compared to average, with an output speed of 92.8 tokens per second.
Meanwhile, xAI burned through massive computational resources training these models. Grok 3 was trained on 100K+ Nvidia H100 GPUs on xAI's Memphis supercomputer, one of the largest AI clusters in the world—built in 122 days. That's an enormous environmental and financial cost for models that consistently underperform their marketing claims.
The company's approach to data sourcing raises additional concerns. Critics fear that by wiring Grok directly into X's digital chatter, xAI risks amplifying biases or misrepresentations of reality unless careful curation and alignment strategies are in place. Training AI on social media discourse is like teaching logic using YouTube comments—you'll get fast responses, but not necessarily intelligent ones.
Despite this track record, the Grok 4 announcement follows the same playbook. Promise revolutionary capabilities, hint at superior performance, and count on the tech press to amplify the message before independent verification becomes possible. The addition of fact-checking capabilities through Community Notes is particularly rich—a company whose previous models hallucinate citations wants to verify other people's facts.
Previous versions faced criticism for misinformation and offensive outputs, yet xAI continues positioning itself as a truth-seeking organization. The cognitive dissonance is staggering.
For marketing professionals, this pattern should be instructive. xAI has mastered the art of generating buzz without delivering substantive improvements. They've turned AI development into performance art, where the announcement is more important than the actual product.
As we await Grok 4's inevitable launch, remember that extraordinary claims require extraordinary evidence. And so far, xAI's evidence remains extraordinarily lacking.
Ready to cut through AI hype and build marketing strategies based on proven performance? Let Winsome Marketing's growth experts help you navigate vendor claims and implement AI solutions that actually deliver results.