The current state of generative AI in video often feels like a “one-shot” magic trick. A user enters a text prompt, a video is produced, and if the result is flawed—such as distorted limbs or unnatural movements—the user is left with little choice but to start over with a new prompt. This “black box” approach creates a barrier for professional creators who require precision rather than random luck.
Cartwheel, a new 3D animation startup, is attempting to break this cycle. Founded by industry veterans Andrew Carr (formerly of OpenAI) and Jonathan Jarvis (formerly of Google), the company is building tools designed to automate the technical heavy lifting of animation while leaving the creative decision-making in the hands of the artist.
The Data Problem: Why 3D is Harder Than Text
While large language models (LLMs) and image generators have flourished due to the near-infinite availability of text and images on the internet, 3D motion data is much harder to find.
The scarcity of high-quality 3D data presents a significant hurdle for AI development. Unlike written language, which is abundant, precise data regarding how bodies move in three-dimensional space is relatively rare. According to co-founder Jonathan Jarvis, the difficulty of sourcing this data was “10 to 100 times” harder than initially anticipated.
To overcome this, Cartwheel is not just generating “pixels” (flat images); they are mapping human biomechanics. Their models aim to translate simple 2D inputs—such as a video of someone dancing—into precise, realistic 3D skeletal structures. This allows for a level of technical accuracy that flat video generators cannot match.
Fighting “AI Sameness” Through Creative Control
A common criticism of generative AI is its tendency toward “sameness”—the phenomenon where content produced by the same model begins to look repetitive and lacks distinct character.
Cartwheel’s founders argue that this lack of variety is a direct result of a lack of control. Their solution is to provide a “control layer” rather than a finished product.
- The AI as a Power Tool: Instead of generating a final, unchangeable video, Cartwheel generates 3D assets that are meant to be manipulated.
- Post-Generation Editing: Because the output is 3D data, creators can adjust lighting, move camera angles, or tweak a character’s pose after the initial generation is complete.
- Personalized Performance: By allowing artists to “push and pull” the performance, the technology moves away from being a replacement for the artist and becomes a sophisticated tool for expression.
The Vision: Open-Ended Storytelling
The ultimate goal for Cartwheel extends beyond mere efficiency; it is about enabling “open-ended storytelling.”
In the rapidly evolving landscapes of gaming and social media, the demand for content is outstripping the capacity of traditional, manual animation. Cartwheel envisions a future where characters are not just playing back pre-recorded loops, but are powered by motion models that allow them to react and perform in real-time.
The founders predict a fundamental shift in the industry workflow:
“Everyone will work in 3D even if it’s authored in 2D, even if the final output is just 2D video.”
By focusing on the “layer below the pixels”—the underlying movement and structure—Cartwheel hopes to bridge the gap between a creator’s 2D vision and a high-fidelity 3D reality.
Conclusion
Cartwheel seeks to transform generative AI from a generator of static videos into a dynamic engine for 3D movement. By prioritizing control and biomechanical accuracy, they aim to ensure that while machines handle the technical mechanics, humans retain the “taste” and emotional heart of the story.




























