Odyssey has released a research preview that points to a different path for AI video: not a finished clip, but a responsive scene that changes as a person interacts with it.
The company calls the format interactive videos. Instead of asking AI to create a fixed sequence from beginning to end, Odyssey’s system generates image sequences that react immediately to input from a keyboard, controller, or smartphone.
What Odyssey Is Demonstrating
The core idea is a video system that behaves more like an environment than a passive piece of media. Odyssey describes the technology as a world model, meaning an AI system built to generate dynamic surroundings that can be acted on, rather than simply watched.
The demo uses an autoregressive world model. In plain terms, the model decides what the next frame should look like by considering the current scene, the user’s action, and the sequence of events that came before it.
That makes the experience different from traditional video generation. A conventional video model can produce an entire clip in one pass. Odyssey’s approach keeps updating the visual output frame by frame, so the scene can change in response to what the user does.
Odyssey says its long-term ambition is to simulate visuals and actions with enough realism that they cannot be distinguished from real life. The research preview is not presented as that final destination. It is described as raw and sometimes unstable, but meaningful because it shows where the company wants AI-generated content to go.
Why World Models Matter
The importance of Odyssey’s demo is not only that the video is AI-generated. The larger shift is from static generation to continuous interaction.
In a passive video, the viewer has no effect on what happens next. In Odyssey’s interactive video, the user’s choices become part of the generation process. The model must maintain a coherent scene while also adapting to new input.
That distinction matters for agentic AI as well. Odyssey frames world models as possible training grounds for AI systems that need to learn and act inside simulated environments. If an AI system can operate in a generated world, it may be able to learn from its own experience inside that simulation.
The source article does not describe this as a finished commercial product. It presents the demo as an early research preview, with clear technical limits. Still, the direction is notable: AI-generated media could move from producing assets to producing experiences that unfold on demand.
Stability Comes With Limits
For the latest preview, Odyssey intentionally narrowed what the model had to handle. The system was first trained on general video footage, then fine-tuned with video from a handful of well-documented scenes.
Co-founder Oliver Cameron says this focused training helps keep the model stable and stops it from drifting into visuals that no longer make sense. According to Cameron, a more generalist model would fall apart after 20 to 30 seconds. The current version can keep video consistent for about two and a half minutes.
That stability has trade-offs. Users cannot yet freely look up or down. The limitation comes from the choice to prioritize a more stable experience over total freedom inside the generated scene.
Cameron also emphasized that the video is not assembled from pre-rendered material. According to him, "every frame is absolutely generated by a diffusion model we've trained."
The system processes input immediately, creates a new frame every 40 milliseconds, and streams the result back. Odyssey runs the system on H100 GPU clusters in the US and EU. Under ideal conditions, latency is 40 milliseconds. Current costs range from one to two dollars per user hour, with prices expected to fall.
What Comes Next
Odyssey is already working on a model designed for broader generalization and more realistic dynamics. Early versions are said to show more varied visual patterns, movement, and interactions, while improving consistency over time.
The direction suggests several challenges that still have to be balanced:
- Generalization: the model needs to handle more environments without losing coherence.
- Control: users will expect more freedom than the current demo allows.
- Latency: real-time interaction depends on fast frame generation and streaming.
- Cost: the system currently has a measurable user-hour cost, even if prices are expected to drop.
Odyssey is not the only company exploring this kind of system. Decart AI is developing a similar idea through Oasis, described as a Minecraft-style game created in real time by AI. Oasis is trained on video data and generates graphics, physics, and gameplay while users interact with a mouse and keyboard. It combines vision transformers with a diffusion model for stable visuals.
Together, these efforts show a broader movement in generative AI: from creating isolated media files toward generating interactive worlds. Odyssey’s preview remains early, limited, and sometimes unstable, but it demonstrates the central promise clearly. AI video may become something people can enter and steer, not just something they press play on.