New ByteDance system raises the stakes for deepfake videos

ByteDance researchers have demoed OmniHuman-1, an AI system that can generate unusually realistic deepfake videos from a single reference image and speech or vocals. The system has not been released, but the examples point to growing risks for politics, fraud, platforms and regulation.

WTF Index TERMINATOR
◄ Terminator 4 Idiocracy 3 ►

A highly realistic deepfake video system increases risks of fraud, political manipulation and loss of control over synthetic media.

New ByteDance system raises the stakes for deepfake videos

Deepfake videos have been easy to make for some time. What has been harder is making them look natural enough that viewers do not immediately spot the artificial parts. A new system demonstrated by researchers from TikTok owner ByteDance suggests that gap may be narrowing quickly.

The system, called OmniHuman-1, can create some of the most convincing AI-generated human videos shown so far, based on the samples released by the ByteDance team. Those examples include a fictional Taylor Swift performance, a TED Talk that never took place and a deepfaked Einstein lecture.

What OmniHuman-1 Can Do

According to the ByteDance researchers, OmniHuman-1 needs only a single reference image and speech or vocals to generate a video clip of an arbitrary length. The generated video can also be adjusted by aspect ratio, and the subject's "body proportion" can be changed, meaning the system can vary how much of the person appears in the frame.

That matters because video deepfakes have often been limited by visible flaws. Many systems can place a person into an image or make a face appear to say something, but the result often shows signs of being synthetic. Movements may look rigid, expressions may feel off, or the body may not match the voice and timing.

OmniHuman-1 appears more advanced in the samples ByteDance shared. The system was trained on 19,000 hours of video content from undisclosed sources. It can also edit existing videos, including changing the movements of a person's limbs.

The source material makes clear that the released examples are cherry-picked. That means they show what the system can do under favorable conditions, not necessarily how it performs in every case. Still, the quality on display is notable because the results can be convincing in ways earlier deepfake techniques often were not.

The Limits Still Matter

OmniHuman-1 is not flawless. The ByteDance team says "low-quality" reference images will not produce the best videos. The system also appears to have trouble with some poses.

One example cited in the source shows odd gestures involving a wine glass. That kind of failure is important because it shows that even strong deepfake systems can still break down when body position, hand movement or object interaction becomes difficult.

But those limits do not erase the broader point. If a system can create a realistic clip from one image and a voice track, the barrier to making believable synthetic video becomes much lower. Even without a public release, the demonstration signals where deepfake tools may be headed.

ByteDance has not released OmniHuman-1. The concern is that the AI community often does not take long to reverse-engineer models like these. If similar tools become widely available, the challenge will not just be whether individual clips can be made, but how quickly they can spread and how difficult they are to verify.

Why Political Deepfakes Are A Serious Risk

The political implications are already visible. Last year, political deepfakes appeared around the globe in several different contexts.

  • On election day in Taiwan, a Chinese Communist Party-affiliated group posted AI-generated, misleading audio of a politician throwing his support behind a pro-China candidate.
  • In Moldova, deepfake videos showed the country's president, Maia Sandu, resigning.
  • In South Africa, a deepfake of rapper Eminem supporting a South African opposition party circulated ahead of the country's election.

These examples show why realism matters. A fake video or audio clip does not need to fool everyone forever to have an effect. It only needs to create confusion, travel quickly or reach people before reliable context catches up.

That problem becomes harder when the synthetic content looks less obviously synthetic. If viewers can no longer rely on visual oddities, stiff movement or poor facial animation as warning signs, the burden shifts to detection tools, platform policies and public skepticism.

Fraud Is Another Front

Deepfakes are also being used in financial crimes. Consumers are being targeted with fake celebrity promotions for fraudulent investment opportunities. Companies are also being deceived by deepfake impersonators, with corporations being swindled out of millions.

According to Deloitte, AI-generated content contributed to more than $12 billion in fraud losses in 2023 and could reach $40 billion in the U.S. by 2027. Those figures underline why the issue is not limited to misinformation or public figures. It is also a business and consumer protection problem.

The more convincing deepfake videos become, the more pressure there is on verification. A familiar face or voice can no longer be treated as proof on its own. That is especially true when synthetic media can be produced from limited inputs, such as a single image and speech or vocals.

Detection And Regulation Are Lagging

There have been calls for stronger rules. Last February, hundreds in the AI community signed an open letter calling for strict deepfake regulation. In the U.S., there is no law criminalizing deepfakes at the federal level, while more than 10 states have enacted statutes against AI-aided impersonation.

California's law is currently stalled. As described in the source, it would be the first to allow judges to order posters of deepfakes to take them down or potentially face monetary penalties.

Platforms and search engines have taken some steps to limit the spread of deepfakes. Even so, the volume of deepfake content online continues to grow at an alarmingly fast rate. Detection remains difficult, especially as generation systems improve.

A May 2024 survey from ID verification firm Jumio found that 60% of people said they encountered a deepfake in the past year. Seventy-two percent of respondents said they were worried about being fooled by deepfakes on a daily basis, and a majority supported legislation to address the spread of AI-generated fakes.

OmniHuman-1 is not available to the public, and its best examples may not represent every output. But the demonstration still points to a clear direction: deepfake videos are becoming more realistic, easier to imagine at scale and harder to dismiss as obvious fakes. That makes the next phase of AI-generated media a problem for technology companies, lawmakers, businesses and everyday users at the same time.