A developer recently published a video documenting what happens when you take the latest large language models and give them direct control of a physical robot — not a research prototype, but a $100 machine built from consumer hardware. The experiment started as a childhood dream. It became something more instructive: a clear-eyed look at where machine intelligence actually is, where it's headed, and what's still missing.
The video is embedded below. The findings are worth understanding even if you never build a robot.
What He Built — and What It Cost
Twenty years ago, the components required for a basic intelligent robot would have cost several thousand dollars and demanded specialized hardware. The same build today: a $15 compute chip, two servo motors at a few dollars each, a $5 camera, an IMU sensor for motion detection under $10, a microphone, a speaker, and a small drone battery. Total: under $100.
Training a neural network to control movement used to require renting a massive CPU cluster for around $25,000 and waiting months for results. The same training now runs on a rented H100 GPU for a few hours at roughly $10. That's a 2,500-fold cost collapse in roughly two decades.
The hardware barrier to building a generally intelligent robot is, for practical purposes, gone.
What the AI Could Do
Rather than programming the robot's behavior directly, the developer connected it to frontier AI models and let them drive. The results were immediately surprising. When fed raw IMU sensor data — just acceleration and rotation numbers — the model described the robot's physical state with striking accuracy and richness. It correctly classified motion, interpreted touch, and generated responses grounded in real sensor data, even when those responses extended beyond it.
When given access to its own motors, the robot began writing small programs to execute actions — and blending those programs with its own trained movement policies. Asked to walk like an old man, it combined a manual leg sweep, a low-energy policy call it knew would produce a shaky effect, and a finishing motion. It arrived at that composition without being told how. The developer described it as feeling creative.
With memory added, the robot adapted its behavior over time, built profiles of people it interacted with, and used a "dreaming" process — sending accumulated memory to a high-capability model to clean, consolidate, and extract lessons — to refine its own strategy and personality at the language level.
Different tasks used different models. Gemini Flash handled fast image interpretation at low cost. Claude Sonnet handled the more demanding memory consolidation tasks — fast enough and capable enough to catch subtleties that smaller models missed.
What It Still Can't Do
Here's where the experiment becomes genuinely interesting. Despite everything the robot could do, there was a persistent physical dumbness the developer couldn't initially name. The robot could understand what just happened to its body. It couldn't accurately predict what was about to happen.
The missing piece in neuroscience has a name: the cerebellum. It's the structure that holds more than three-quarters of all neurons in the human brain, responsible for fast, unconscious physical imagination — predicting the next 0.1 seconds of physical reality in about 0.02 seconds, generating action chunks that get ahead of the brain's sensory delay. When you catch a ball, you're catching a prediction. The cerebellum generates that prediction continuously, silently, without conscious thought.
Current AI models — including the most capable ones — don't have a reliable equivalent. They can describe physical states with sophistication. They can't rapidly simulate the near-term physical consequences of their own actions with the precision required for fine motor control. That gap is what separates a robot that can walk from one that can catch.
The field knows this. The architecture researchers are converging on a model that combines fast, unconscious motor prediction networks with slower, language-level reasoning, all built on a shared sensory foundation — essentially building a computational analog of the cerebellum-cortex relationship. It's not there yet. The trajectory is clear.
Why This Matters for Marketing and Growth Leaders
Physical robotics might feel distant from marketing strategy and AI-assisted workflows. It isn't, structurally. The same architectural questions — how AI systems handle memory, how they generalize from experience, how they balance fast pattern-matching with slower reasoning — are present in every AI tool your team uses today. The robot makes those questions visible in a way that a chatbot doesn't.
The developer's observation about memory is particularly transferable: what we call an AI agent is just a model in a loop with memory it can read and write. That's true of the robot. It's also true of the AI systems being built into enterprise marketing stacks right now. The quality of what those systems learn over time depends entirely on how their memory is structured and maintained.
The $100 price point is the other thing worth sitting with. The cost collapse that made this robot possible is the same cost collapse driving AI accessibility across every industry. The question is no longer whether the tools exist. It's whether the people deploying them understand what they're working with.
Want to build AI-assisted marketing systems that actually learn and improve over time? Winsome Marketing helps growth teams design the right architecture from the start. Let's talk.


Writing Team