AI companies are looking for ways to make models forget information they should not have learned in the first place. A study co-authored by researchers at the University of Washington (UW), Princeton, the University of Chicago, USC and Google suggests that today’s leading AI unlearning methods are still a rough instrument.
The finding is straightforward but important: current techniques can push a model away from targeted data, yet they can also make the same model worse at ordinary tasks. In some cases, the study found the degradation was severe enough to make models unusable.
What AI unlearning is trying to solve
Generative AI models are trained on large collections of examples, including text, images, speech, music, videos and other data. They do not understand this material in a human sense. They learn patterns from the data and use those patterns to predict what is likely to come next.
That is why a model trained on emails might complete the phrase “Looking forward…” with a common ending such as “... to hearing back,” based on patterns it has seen before. The model is not expressing intent. It is making a statistical prediction from its training.
The problem is that training data can include information that model developers, data owners or regulators may later want removed. The source article gives two broad categories: sensitive private data and copyrighted material. Examples include medical records, compromising photos, phone numbers and other private information that can be swept up during training.
Copyright is another pressure point. Many major models, including systems such as GPT-4o, are trained on public websites and data sets around the web. Vendors often argue that fair use protects this practice, even when data owners were not informed, paid or credited. Some copyright holders, including authors, publishers and record labels, have filed lawsuits against vendors to force change.
Why deleting data is not simple
Unlearning sounds like a clean answer: if a model learned something it should not know, make it forget. But the study described in the source article shows why the technical challenge is harder than the word suggests.
Today’s unlearning methods use algorithms that try to steer a model away from the data selected for removal. The goal is to change the model’s predictions so it does not output that data, or only does so rarely. This is not the same as removing a file from a folder. The information is distributed through the model’s learned patterns.
Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, told TechCrunch: “Currently, there are no efficient methods that enable a model to forget specific data without considerable loss of utility.”
That loss of utility is the core issue. A model may stop reproducing targeted material, but it may also lose related knowledge that users still expect it to retain. In practical terms, the same operation meant to reduce risk can also weaken the product.
How the researchers tested forgetting
Shi and her collaborators built a benchmark called MUSE, short for Machine Unlearning Six-way Evaluation. They used it to test eight different open algorithms. The benchmark was designed to check more than whether a model could avoid repeating training data word for word.
MUSE examines whether an unlearning method can address several related signs that a model was trained on specific data. These include whether the model can:
- Repeat training text verbatim, a behavior known as regurgitation.
- Answer questions about the targeted material.
- Show evidence that the material was part of training.
- Keep related general knowledge after the unlearning process.
For the benchmark, scoring well required models to forget two types of material: books from the Harry Potter series and news articles. The source article describes an example involving a snippet from Harry Potter and The Chamber of Secrets. MUSE checks whether a model can complete the sentence, answer questions about the scene or otherwise show knowledge of the text.
The benchmark also checks whether the model keeps broader related knowledge. One example in the source is whether the model still knows that J.K. Rowling is the author of the Harry Potter series. The researchers call this overall utility. When utility falls, the model becomes less able to answer questions correctly.
The trade-off for model makers
The study found that the tested algorithms did make models forget certain information. But they also damaged general question-answering ability. That creates a direct trade-off for companies hoping AI unlearning can solve data problems after training has already happened.
Shi explained the difficulty by pointing to the way knowledge becomes entangled inside a model. If a model has learned from copyrighted Harry Potter books and also from freely available content from the Harry Potter Wiki, removing the books can also affect what the model knows from the wiki. The model does not store those sources as cleanly separated blocks of knowledge.
This matters because some vendors already offer ways for data owners to ask that their data be excluded from training sets. According to the source article, those opt-out tools apply to future models, not models that were trained before the tools existed. Unlearning would be a more thorough way to handle deletion requests, but the study suggests it is not ready to carry that burden yet.
Google, in partnership with several academic institutions, launched a competition last year to encourage new unlearning approaches. That attention reflects how important the problem has become. But the study’s conclusion is cautious: more research is needed before unlearning can be used reliably in real-world settings.
What this means for AI safety and data rights
The implications are practical. If current AI unlearning techniques reduce unwanted outputs while also making models less capable, developers cannot treat them as a simple fix for copyright disputes, privacy concerns or government orders. A tool that solves one problem by breaking core performance is not yet a dependable deployment strategy.
For now, companies working with generative AI still face a hard choice. They need to prevent models from exposing or reproducing undesirable data, but the available unlearning methods may weaken the same systems users rely on. A future technical breakthrough could change that, but the source article makes clear that today’s methods are not enough.
The study does not say AI unlearning is pointless. It shows that forgetting is possible, but costly. Until researchers find methods that remove specific data while preserving useful knowledge, model makers will need other ways to keep their systems from saying things they should not.