AI video is becoming harder to verify at the exact moment more people may ask AI assistants to do the checking. A Newsguard investigation found that leading chatbots struggled to identify videos made with OpenAI's Sora, including cases where the clips were based on demonstrably false claims.
The results point to a central problem for AI video detection: the tools that generate convincing synthetic footage are moving faster than the consumer-facing systems that are supposed to help users judge what they are seeing.
Newsguard tested chatbots against Sora videos
Newsguard analysts created 20 Sora videos using false claims from their database. They then tested three leading chatbots with two kinds of user questions: "is this real" and "is this AI-generated".
That setup matters because it mirrors the way a typical user might approach suspicious footage. Some people may ask whether a video is real. Others may ask more directly whether AI made it. In both cases, the investigation found that the answers were often wrong.
xAI's Grok, described in the source as presumably Grok 4, failed to identify 95 percent of the test videos as AI-generated. OpenAI's ChatGPT, described as presumably GPT-5.2, had a 92.5 percent error rate. Google's Gemini, described as presumably Gemini 3 Flash, performed best, but still failed 78 percent of the time.
ChatGPT's result is especially notable because Sora and ChatGPT come from the same company. OpenAI offers a system that can create realistic AI video, while its assistant did not reliably recognize videos from that same family of tools. The company did not respond to Newsguard's request for comment.
Watermarks did not solve the detection problem
Sora places a visible watermark on generated videos. The source describes it as an animated logo with the word "Sora" moving across the frame. In theory, that should help viewers and automated systems understand where the footage came from.
In practice, Newsguard found that the visible signal was weak protection. According to the investigation, several free online services appeared just weeks after Sora launched in February 2025 that could remove the watermark. The analysts used one such tool in their test.
The barrier was low: the source states that removing visible proof of origin required neither technical skills nor money. That makes watermark removal relevant not only for specialists, but also for ordinary users who want to share a synthetic clip without the most obvious sign that it was generated.
Even when visible watermarks were still present, the chatbots were not equally reliable. Grok failed in 30 percent of those cases, while ChatGPT failed in 7.5 percent. Gemini correctly identified all marked videos as AI-generated.
One example showed how the systems can move from missing evidence to inventing it. When Grok was shown a video about alleged fighter jet deliveries from Pakistan to Iran, it said the footage came from "Sora News", a news organization that does not exist.
Invisible credentials were also fragile
Sora videos also include an invisible marker. The source says they contain metadata, known as content credentials, following the C2PA industry standard. These credentials can be checked through the official tool at verify.contentauthenticity.org.
But the investigation described a weakness in how those credentials survive ordinary handling. Downloading a generated video through the download button in the Sora interface kept the metadata intact. A simple right-click and "Save as", however, produced a version without the credentials, according to the source author's quick test.
That distinction is important because it does not rely on advanced editing. If a basic browser action can remove the invisible signal, then content credentials may not travel reliably with a video as it moves between users, platforms, and reposts.
The source also says ChatGPT, when asked about a video that included C2PA data, confidently claimed no content credentials could be found in the clip. That example reinforces the larger pattern: the problem was not only missed detection, but confident answers that made the failure harder for users to notice.
False videos can gain false confirmation
Newsguard's earlier examples show why weak AI video detection matters. In one case, analysts created a fake video that supposedly showed an ICE officer arresting a six-year-old child. Both ChatGPT and Gemini considered the footage authentic, according to the source, and said news sources confirmed the event and that it took place at the US-Mexico border.
Another fake video allegedly showed a Delta employee removing a passenger from a plane for wearing a "Make America Great Again" cap. All three models classified that footage as genuine.
These examples show a practical risk. A user may not treat a chatbot answer as just another guess. If the assistant states that a video is real, or claims outside confirmation for an event, that response can make a false clip appear more credible.
Newsguard also found that the tested systems rarely warned users about their own limits. ChatGPT pointed out this limitation in 2.5 percent of tests, Gemini in 10 percent, and Grok in 13 percent. Instead of regularly saying they could not reliably determine whether footage was AI-generated, the systems often gave direct assessments.
"ChatGPT does not have the ability to determine whether content is AI-generated."
OpenAI's head of communications Niko Felix confirmed that limitation to Newsguard. The source says he did not explain why ChatGPT does not communicate that restriction by default. xAI did not respond to two inquiries about Grok.
Provider-specific detection is not enough
Google has taken a different route with Gemini. The company promotes Gemini's ability to identify content from its own image generator Nano Banana Pro. In five Newsguard tests, Gemini correctly recognized all of its own AI images, even after the watermarks were removed.
That detection relies on Google's SynthID tool, which invisibly marks content as AI-generated. The source says the marking should survive edits such as cropping.
But Google's communications manager Elijah Lawal acknowledged a major boundary: verification currently works only for Google's own content. Gemini cannot reliably recognize Sora videos or content from other providers.
That leaves the broader ecosystem with a gap. If each company can only identify its own outputs, users still face uncertainty when videos move across platforms, lose visible labels, or come from tools outside a chatbot's own detection system. Newsguard's findings suggest that users should treat chatbot verdicts on AI video with caution, especially when the assistant gives a confident answer without explaining its limits.