TechCrunch AI December 27, 2024 IDIOCRACY

Why a DeepSeek V3 ChatGPT mix-up matters for AI trust

DeepSeek V3, a new model from Chinese AI lab DeepSeek, has repeatedly identified itself as ChatGPT in public examples and TechCrunch tests. The behavior points to a larger problem: AI training data is increasingly mixed with outputs from other AI systems, which can affect accuracy, trust, and bias.

WTF Index IDIOCRACY

◄ Terminator 0 Idiocracy 2 ►

The story mainly highlights AI-generated training contamination eroding accuracy, trust, and information quality rather than increased autonomy or danger.

Why a DeepSeek V3 ChatGPT mix-up matters for AI trust

DeepSeek V3 arrived with strong benchmark performance and a claim to efficiency, but one unexpected behavior has drawn attention: the model sometimes says it is ChatGPT.

That confusion is more than a branding mistake. It shows how hard it has become to know what modern AI models have absorbed during training, especially as the web fills with machine-generated text.

A capable model with a strange identity problem

Earlier this week, DeepSeek, a well-funded Chinese AI lab, released DeepSeek V3, an “open” AI model built for text-based work such as coding and writing essays. The model is described as large but efficient, and it performs well against many rivals on popular benchmarks.

But posts on X and TechCrunch’s own tests showed a recurring issue. When asked who it is, DeepSeek V3 has identified itself as ChatGPT, OpenAI’s AI-powered chatbot platform. When pressed for more detail, it has insisted that it is a version of OpenAI’s GPT-4 model released in 2023.

One post from Lucas Beyer on December 27, 2024 said the behavior appeared in 5 out of 8 generations, with DeepSeek V3 claiming to be ChatGPT (v4) more often than it claimed to be DeepSeekV3. That example does not prove how the model was trained, but it gives a visible sign that its outputs can echo another system’s identity.

The issue goes beyond self-identification. According to the source article, when asked about DeepSeek’s API, DeepSeek V3 can answer with instructions for OpenAI’s API. It has also produced some of the same jokes as GPT-4, including the punchlines.

Why training data contamination can produce this behavior

Models such as ChatGPT and DeepSeek V3 are statistical systems trained on billions of examples. They learn patterns from those examples and then use those patterns to generate likely responses. If certain phrases, answer formats, or self-descriptions appear often enough in the training data, a model can reproduce them in the wrong setting.

DeepSeek has not revealed much about the source of DeepSeek V3’s training data. The source article notes, however, that public datasets can contain text generated by GPT-4 through ChatGPT. If such material appeared in DeepSeek V3’s training set, the model may have memorized or absorbed parts of those outputs and repeated them later.

Mike Cook, a research fellow at King’s College London specializing in AI, told TechCrunch that the model appears to have encountered raw ChatGPT responses at some point, though the source is unclear. He also warned that some developers have directly trained models on outputs from other models in an effort to benefit from their knowledge.

This kind of reuse can damage model quality. Cook compared it to making a photocopy of a photocopy, where each copy loses information and connection to reality. In AI terms, that can mean more hallucinations, more misleading answers, and weaker grounding in the underlying facts.

The terms and trust questions

There is also a business and policy issue. OpenAI’s terms prohibit users of its products, including ChatGPT customers, from using outputs to develop models that compete with OpenAI’s own. The source article says OpenAI and DeepSeek did not immediately respond to requests for comment.

OpenAI CEO Sam Altman posted on X Friday in what appeared to be a dig at DeepSeek and other competitors. He wrote: “It is (relatively) easy to copy something that you know works,” and added that doing something new, risky, and difficult is extremely hard when the outcome is uncertain.

The key trust problem is simple: if a model cannot reliably say what it is, users have reason to question what else it may be reproducing without context. A wrong model identity is easy to notice. More subtle inherited flaws may be harder to detect.

Self-identification: DeepSeek V3 can call itself ChatGPT or GPT-4 in some prompts.
Product confusion: It can answer DeepSeek API questions with OpenAI API guidance.
Output similarity: It can repeat jokes associated with GPT-4, including punchlines.
Quality risk: Training on AI outputs can increase hallucinations and misleading answers.

A wider problem for the AI web

DeepSeek V3 is not the first model to misidentify itself. The source article notes that Google’s Gemini and other models sometimes claim to be competing models. In one example, when prompted in Mandarin, Gemini says it is Baidu’s Wenxinyiyan chatbot.

The broader reason is that AI companies commonly use the web as a major source of training data, and that web is increasingly crowded with AI-generated material. The source article points to content farms using AI to produce clickbait, as well as bots flooding Reddit and X. It also cites one estimate that 90% of the web could be AI-generated by 2026.

That creates what the article calls “contamination.” If AI-generated text is mixed into large public datasets, developers face a harder task when trying to filter it out. Even accidental inclusion can leave traces in a new model’s behavior.

Heidy Khlaaf, chief AI scientist at the nonprofit AI Now Institute, told TechCrunch that the cost savings from “distilling” an existing model’s knowledge can appeal to developers despite the risks. She also said that if DeepSeek partly used OpenAI models for distillation, it would not be surprising.

What the DeepSeek V3 case really signals

The most likely explanation presented in the source article is that a large amount of ChatGPT or GPT-4 data entered the DeepSeek V3 training set. That does not establish exactly how it happened. It does mean the model should not be trusted as an authority on its own identity.

The more serious concern is not the awkward answer itself. It is the possibility that DeepSeek V3 may have absorbed and repeated GPT-4’s biases and flaws without enough filtering or correction. If a model learns from another model’s outputs, it may inherit not only useful patterns but also mistakes.

For users, the lesson is practical. Benchmarks can show capability, but they do not answer every question about model quality, provenance, or reliability. When an AI model echoes a rival system too closely, the issue is not just who gets credit. It is whether the model’s answers are grounded in the world, or in layers of previous AI output.