OpenAI’s Sora can turn prompts and images into short videos, but its launch has reopened a difficult question for generative AI: what material did the model learn from, and who had rights in that material?
TechCrunch reported that Sora appears able to generate clips resembling video game footage, Twitch streams, and streamer-like personas. OpenAI has not revealed the exact training data behind Sora, which leaves the legal debate focused on what the outputs seem to show and what those outputs may imply.
What Sora appeared to produce
Sora launched on Monday and can create videos up to 20-second-long in multiple aspect ratios and resolutions. When OpenAI first revealed Sora in February, it alluded to training the model on Minecraft videos.
TechCrunch tested whether other game-like material might be reflected in Sora’s behavior. The results included footage resembling a Super Mario Bros. clone, a first-person shooter inspired by Call of Duty and Counter-Strike, and an arcade fighter in the style of a ’90s Teenage Mutant Ninja Turtle game.
The article also described Sora as appearing to understand the visual language of a Twitch stream. One generated screenshot captured the broad layout of that format and included the likeness of popular Twitch streamer Raúl Álvarez Genes, known as Auronplay, including the tattoo on Genes’ left forearm.
Another output showed a character similar in appearance to Imane Anys, better known as Pokimane. TechCrunch noted that some prompts had to be indirect because OpenAI has filtering intended to prevent Sora from generating trademarked characters. For example, typing “Mortal Kombat 1 gameplay,” according to the source, would not produce anything resembling that title.
Why training data is central
OpenAI has been cautious about describing where Sora’s training material came from. In March, OpenAI’s then-CTO, Mira Murati, would not outright deny in an interview with The Wall Street Journal that Sora was trained on YouTube, Instagram, and Facebook content.
OpenAI’s Sora tech specs said the company used “publicly available” data as well as licensed data from stock media libraries such as Shutterstock. TechCrunch reported that OpenAI did not initially respond to a request for comment, though a PR representative later said they would “check with the team.”
The training question matters because generative AI models learn patterns from large datasets. That can make them useful for producing new scenes, but it can also create risk when prompts lead models toward recognizable material.
The source points to broader legal fights already underway. Microsoft and OpenAI are being sued over allegations involving licensed code. Midjourney, Runway, and Stability AI are facing a case accusing them of infringing artists’ rights. Major music labels have filed suit against Udio and Suno over AI-powered song generators.
Game footage can contain several rights layers
Legal experts cited by TechCrunch said video game playthroughs can be especially complicated. Joshua Weigensberg, an IP attorney at Pryor Cashman, said companies training on unlicensed game footage face many risks because training a generative AI model generally involves copying training data.
Evan Everist, an attorney at Dorsey & Whitney specializing in copyright law, described game playthrough videos as potentially involving more than one rights holder. The game developer may own the contents of the game, while the player or videographer may have rights in the unique video of the play session.
For some games, the situation can become even more complex when user-generated content appears in the software. Everist gave Epic’s Fortnite as an example because players can create their own maps and share them. A playthrough video of one such map could involve Epic, the person using the map, and the map’s creator.
Weigensberg also noted that games include protectable elements such as proprietary textures. If those works were not properly licensed, training on them may raise infringement concerns.
TechCrunch contacted several game studios and publishers, including Epic, Microsoft, Ubisoft, Nintendo, Roblox, and Cyberpunk developer CD Projekt Red. Few responded, and none gave an on-the-record statement. CD Projekt Red said it would not be able to get involved in an interview at the moment, while EA said it did not have any comment at this time.
The output risk may not end with OpenAI
Even if AI companies ultimately prevail in some legal disputes, users may still face risks from the material a model produces. Jesse Saivar, chair of Greenberg Glusker’s IP and digital media and technology groups, told TechCrunch that key questions remain unsettled, including whether training involves copying copyrighted works, whether that copying is infringement, whether it affects the market for the original work, and whether copyright owners can show actual damage or injury.
If a model produces a recognizable protected asset and a user publishes it or uses it in another project, that user could face accusations of infringement. Weigensberg said generative AI systems often output recognizable, protectable IP assets, and that more complex systems may have the same problem as simpler text or image generators.
Some AI companies include indemnity clauses for certain situations, but the source notes that these clauses can have carve-outs. OpenAI’s applies only to corporate customers, not individual users.
Copyright is not the only issue. Weigensberg also pointed to trademark rights and name, image, and likeness rights. If an output includes recognizable game characters, branding-related assets, or a streamer-like identity, the legal analysis may extend beyond whether a training set contained copyrighted footage.
World models could raise the stakes
The debate may become more important as interest grows in world models. OpenAI considers Sora to be a world model, and one possible application is generating video games in real time.
If synthetic games created by such systems resemble the material used to train them, that could intensify the legal questions. Avery Williams, an IP trial lawyer at McKool Smith, said that training an AI platform on the voices, movements, characters, songs, dialogue, and artwork in a video game raises the same fair use questions now appearing across generative AI litigation.
For now, the core issue is unresolved. Sora’s game-like and Twitch-like outputs do not by themselves disclose the contents of its training data. But they do show why creators, game companies, AI developers, and users are all watching the boundary between learned patterns and recognizable protected work.