Crayo: The Short-Form Video Tool Built by YouTubers
Short-form video isn't optional anymore. It's infrastructure. Brands need dozens of clips weekly to stay visible on TikTok, Instagram, and YouTube...
3 min read
Writing Team
:
Jan 26, 2026 8:00:00 AM
CalMatters and The Markup just published rigorous testing of four leading AI video generation models—OpenAI's Sora 2, Google's Veo 3.1, Kuaishou's Kling 2.5, and MiniMax's Hailou 2.3—across nine different dance styles ranging from the Macarena to traditional Cahuilla bird dance to TikTok's Renegade. They generated 36 videos total using standardized prompts optimized for each platform.
@aeyespybywinsome No dance no dice
♬ original sound - AEyeSpy
The results are unambiguous: Not a single video produced the actual dance requested. Zero out of 36.
This matters because AI video generation has been marketed as approaching human-level capability in creating realistic motion. These test results suggest the gap between vendor claims and actual performance remains substantial when complex human movement is required.
The methodology was careful and comprehensive. CalMatters and The Markup evaluated generated videos on six criteria:
All but one video showed a figure dancing—that single failure from Kling 2.5 produced someone doing side lunges instead. So the models can generate human figures in motion that reads as "dancing" to viewers unfamiliar with the specific dances requested.
But none generated the actual choreography prompted. For cultural dances, this failure was particularly stark. Emily Clarke, a Cahuilla Band of Indians tribal member reviewing the bird dance videos, said: "None of these depictions are anywhere close to bird dancing, in my opinion."
For the Horton dance—a specific modern dance technique with defined movements—choreographer Emma Andre noted that while Veo 3.1's output was "staggeringly lifelike," it still didn't show the prompted fortification number 3 movement.
Approximately one-third of videos (11 of 36) exhibited consistency issues: sudden clothing changes, hair transformations, limb structure problems, heads rotating independently of bodies, and limbs "liquefying and reconstituting." The researchers noted this represents significant improvement from initial testing in late 2024, but the problems remain substantial.
These results expose a critical gap in AI video generation: the difference between producing convincing motion and producing specific motion.
The models demonstrated they can generate human figures that move fluidly, match scene descriptions, and create visually compelling footage. What they cannot do—at least not yet—is translate choreographic specificity into visual output. They can make someone look like they're dancing. They cannot make someone perform the Macarena.
This distinction matters enormously for applications requiring precision rather than plausibility. If you need a generic "person dancing in a studio" for B-roll footage, these tools might suffice. If you need actual choreography—whether for cultural authenticity, technical accuracy, or legal compliance with licensed dances—they fail completely.
The researchers acknowledged several limitations in their testing methodology. They didn't use image-to-video generation (uploading static images alongside text prompts), which some platforms advertise specifically for dance generation. They didn't test multiple dancers simultaneously. They didn't optimize prompts individually for each model's specific guidelines, instead standardizing across platforms.
Even accounting for these limitations, the complete absence of accurate choreographic output across 36 attempts using four different leading platforms indicates systematic capability gaps, not just prompt engineering failures.
When CalMatters and The Markup asked dancers and choreographers whether AI could disrupt their industry, most concluded human dancers couldn't be replaced. These test results validate that assessment—at least for now.
The economic question isn't whether AI can replace dancers entirely. It's which dance applications get displaced by "good enough" AI-generated motion that's cheaper than hiring humans, even if technically inaccurate. Stock footage, background dancers in wide shots, conceptual movement in advertisements—these use cases don't necessarily require choreographic precision.
But cultural dances, performance documentation, instructional content, and any application where specific movement matters remain firmly in human territory. The Cahuilla bird dance videos weren't just inaccurate—they were, according to a tribal member evaluating them, "nowhere close." That's not a minor quality gap. It's categorical failure.
For choreographers and movement directors, these results suggest AI video tools function more as concept visualization than production-ready content. You might use them to quickly mock up general movement ideas, but you'll still need humans to execute actual choreography.
Dance represents an edge case—complex, culturally specific human movement with clear right and wrong answers. But the failure mode revealed here likely applies to other domains requiring precision.
If leading AI video models cannot reliably generate a Macarena—one of the most widely performed, documented, and culturally saturated dances of the past 30 years—what else can't they generate accurately when specificity matters?
The answer probably includes: specific sports techniques, culturally authentic rituals, technical procedures, martial arts forms, sign language, and any other domain where movement carries semantic meaning rather than just visual plausibility.
These models have become remarkably good at creating footage that looks real. They remain poor at creating footage that depicts specific real things accurately. That's the gap between impressive demos and reliable production tools.
For those of us evaluating AI capabilities for business applications, this distinction is essential. AI video generation has advanced substantially—the researchers noted significant improvement even from late 2024 testing. But advancement toward visual plausibility doesn't automatically translate to accuracy for specific use cases.
The dancers were right. AI can't replace them yet. Not because it can't make people move on screen, but because it can't make them move correctly.
Short-form video isn't optional anymore. It's infrastructure. Brands need dozens of clips weekly to stay visible on TikTok, Instagram, and YouTube...
Kling AI just launched Video 2.6, the first AI video model that generates audio and video simultaneously from a single prompt. No more silent AI...
HeyGen's Avatar IV just accomplished something genuinely remarkable: they've created digital twins that are genuinely indistinguishable from their...