3 min read

CalMatters Tests AI Video Models on Dance—None Could Actually Dance

Picture of Writing Team Writing Team : Jan 26, 2026 8:00:00 AM

AI Art AI Video

6:46

CalMatters and The Markup just published rigorous testing of four leading AI video generation models—OpenAI's Sora 2, Google's Veo 3.1, Kuaishou's Kling 2.5, and MiniMax's Hailou 2.3—across nine different dance styles ranging from the Macarena to traditional Cahuilla bird dance to TikTok's Renegade. They generated 36 videos total using standardized prompts optimized for each platform.

@aeyespybywinsome
No dance no dice
♬ original sound - AEyeSpy

The results are unambiguous: Not a single video produced the actual dance requested. Zero out of 36.

This matters because AI video generation has been marketed as approaching human-level capability in creating realistic motion. These test results suggest the gap between vendor claims and actual performance remains substantial when complex human movement is required.

What the Testing Actually Measured

The methodology was careful and comprehensive. CalMatters and The Markup evaluated generated videos on six criteria:

Did the subject dance at all?
Did they perform the specific prompted dance?
Did they maintain consistent physical appearance?
Did they produce realistic motion based on human physiology?
Did the scene match the prompt?
Did the camera position match the prompt?

All but one video showed a figure dancing—that single failure from Kling 2.5 produced someone doing side lunges instead. So the models can generate human figures in motion that reads as "dancing" to viewers unfamiliar with the specific dances requested.

But none generated the actual choreography prompted. For cultural dances, this failure was particularly stark. Emily Clarke, a Cahuilla Band of Indians tribal member reviewing the bird dance videos, said: "None of these depictions are anywhere close to bird dancing, in my opinion."

For the Horton dance—a specific modern dance technique with defined movements—choreographer Emma Andre noted that while Veo 3.1's output was "staggeringly lifelike," it still didn't show the prompted fortification number 3 movement.

Approximately one-third of videos (11 of 36) exhibited consistency issues: sudden clothing changes, hair transformations, limb structure problems, heads rotating independently of bodies, and limbs "liquefying and reconstituting." The researchers noted this represents significant improvement from initial testing in late 2024, but the problems remain substantial.

What This Reveals About Current Capabilities

These results expose a critical gap in AI video generation: the difference between producing convincing motion and producing specific motion.

The models demonstrated they can generate human figures that move fluidly, match scene descriptions, and create visually compelling footage. What they cannot do—at least not yet—is translate choreographic specificity into visual output. They can make someone look like they're dancing. They cannot make someone perform the Macarena.

This distinction matters enormously for applications requiring precision rather than plausibility. If you need a generic "person dancing in a studio" for B-roll footage, these tools might suffice. If you need actual choreography—whether for cultural authenticity, technical accuracy, or legal compliance with licensed dances—they fail completely.

The researchers acknowledged several limitations in their testing methodology. They didn't use image-to-video generation (uploading static images alongside text prompts), which some platforms advertise specifically for dance generation. They didn't test multiple dancers simultaneously. They didn't optimize prompts individually for each model's specific guidelines, instead standardizing across platforms.

Even accounting for these limitations, the complete absence of accurate choreographic output across 36 attempts using four different leading platforms indicates systematic capability gaps, not just prompt engineering failures.

The Economic and Creative Implications

When CalMatters and The Markup asked dancers and choreographers whether AI could disrupt their industry, most concluded human dancers couldn't be replaced. These test results validate that assessment—at least for now.

The economic question isn't whether AI can replace dancers entirely. It's which dance applications get displaced by "good enough" AI-generated motion that's cheaper than hiring humans, even if technically inaccurate. Stock footage, background dancers in wide shots, conceptual movement in advertisements—these use cases don't necessarily require choreographic precision.

But cultural dances, performance documentation, instructional content, and any application where specific movement matters remain firmly in human territory. The Cahuilla bird dance videos weren't just inaccurate—they were, according to a tribal member evaluating them, "nowhere close." That's not a minor quality gap. It's categorical failure.

For choreographers and movement directors, these results suggest AI video tools function more as concept visualization than production-ready content. You might use them to quickly mock up general movement ideas, but you'll still need humans to execute actual choreography.

What This Means for AI Video Generally

Dance represents an edge case—complex, culturally specific human movement with clear right and wrong answers. But the failure mode revealed here likely applies to other domains requiring precision.

If leading AI video models cannot reliably generate a Macarena—one of the most widely performed, documented, and culturally saturated dances of the past 30 years—what else can't they generate accurately when specificity matters?

The answer probably includes: specific sports techniques, culturally authentic rituals, technical procedures, martial arts forms, sign language, and any other domain where movement carries semantic meaning rather than just visual plausibility.

These models have become remarkably good at creating footage that looks real. They remain poor at creating footage that depicts specific real things accurately. That's the gap between impressive demos and reliable production tools.

For those of us evaluating AI capabilities for business applications, this distinction is essential. AI video generation has advanced substantially—the researchers noted significant improvement even from late 2024 testing. But advancement toward visual plausibility doesn't automatically translate to accuracy for specific use cases.

The dancers were right. AI can't replace them yet. Not because it can't make people move on screen, but because it can't make them move correctly.