Ars Technica AI October 4, 2024 IDIOCRACY

One Photo Can Now Drive Meta’s Movie Gen AI Video

Meta has previewed Movie Gen, a suite of AI models for creating and editing video, images and audio. The system can turn a single image of a person into a realistic personalized video, a capability that raises major deepfake concerns even as Meta frames it as a creative tool.

WTF Index IDIOCRACY

◄ Terminator 3 Idiocracy 4 ►

Single-photo personalized video generation heightens deepfake risk and makes visual media harder to trust.

One Photo Can Now Drive Meta’s Movie Gen AI Video

Meta’s Movie Gen preview points to a near future in which AI video is easier to make, easier to edit and harder to trust at a glance. The system can generate video from text or an image, add synchronized sound and turn one photo of a person into a realistic personalized video.

What Meta Is Previewing

On Friday, Meta announced a preview of Movie Gen, a new suite of AI models built to create and manipulate video, audio and images. The company says the models can create realistic video from a single photo of a person, and it claims the system outperforms other video-synthesis models in human evaluations.

Meta has not said when or how these capabilities might become available to the public. For now, Movie Gen is being presented as a creative technology rather than a finished consumer product.

The company says the tool may help people “enhance their inherent creativity” instead of replacing human artists and animators. Meta describes possible future uses such as making and editing “day in the life” videos for social platforms or creating personalized animated birthday greetings.

Movie Gen follows Meta’s earlier work in video and image synthesis, including 2022’s Make-A-Scene video generator and the Emu image-synthesis model. The new system expands that work by combining video generation, video editing, personalized video creation and sound generation inside one broader suite.

What Movie Gen Can Generate

Movie Gen’s video-generation model can create 1080p high-definition videos up to 16 seconds long at 16 frames per second. It can work from text descriptions or from an image input.

Meta says the model can handle object motion, subject-object interactions and camera movements. Those claims matter because video generation is not only about producing sharp frames. A useful video model also has to maintain coherence as subjects move, objects interact and the camera changes perspective.

The preview also places Meta in a crowded AI video field. Google showed a model called “Veo” in May, and Meta says Movie Gen outputs beat OpenAI’s Sora, Runway Gen-3 and Chinese video model Kling in human preference tests.

That does not mean every output will be equally strong. As with earlier AI video generators, the best examples shown publicly may differ from typical results. Coherent results can depend on the concepts found in the training videos, and getting a polished clip may require repeated attempts.

The Deepfake Issue Is Central

Meta calls one Movie Gen capability “personalized video creation.” A more familiar term for the same kind of risk is deepfakes.

In this case, the process appears simple: provide one input image of a person and a text prompt describing what that person should do or where they should appear. Movie Gen then generates a video that aims to preserve the person’s identity and motion while incorporating the prompt details.

That ability creates obvious creative possibilities, but it also makes the risks harder to dismiss. The source article notes that deepfake technology has alarmed some experts because it can simulate authentic camera footage and make people appear to do things they did not actually do.

Potential abuse could include humiliating videos, fake compromising situations, fabricated historical context or deepfake video pornography. The larger concern is that media could become harder to interpret without deeper context, especially as AI-generated video becomes more fluid and eventually real-time.

Movie Gen is not the only model to show how little input can be needed. In April, Microsoft demonstrated VASA-1, which can create a photorealistic video of a person talking from a single photo and a single audio track. Movie Gen goes further by placing a deepfaked person inside a video scene, whether AI-generated or otherwise, though it does not appear to generate or synchronize speech yet.

Editing And Sound Move The Technology Forward

Movie Gen is also designed for editing existing video. Meta showed a component that can make precise changes based on text instructions.

Those edits can be local, such as adding or removing elements. They can also be broader changes, such as altering the background or overall style of a video.

Sound is another major part of the preview. Earlier video synthesis models have commonly produced silent clips, but Meta is using a separate audio-generation model to create ambient sound, sound effects and instrumental background music matched to video content from text prompts.

Meta says the audio model can generate sound for videos of any length while keeping the audio coherent throughout. If that works reliably, AI video would move closer to complete media generation rather than silent visual drafts.

Limits, Training Data And Creator Feedback

Meta acknowledges that the current models still have limitations. The company plans to speed up video-generation time and improve overall quality by scaling up the models further.

Training data is another important question. Meta says the models were trained on a combination of “licensed and publicly available datasets.” The source article notes that this very likely includes videos uploaded by Facebook and Instagram users over the years, while identifying that point as speculation based on Meta’s current policies and previous behavior.

Meta also released a research paper with more detail on how the Movie Gen models work. In addition, the company plans to collaborate with filmmakers and creators so their feedback can shape future versions.

That feedback may not be universally positive. The source article points to warnings from the SAG-AFTRA actor’s union last year and divisive reactions to video synthesis from some industry professionals. Movie Gen may be pitched as a creative tool, but its most consequential impact could be the way it changes trust, consent and authorship around video.