A new clinical trial suggests that a carefully built generative AI therapy tool can help some people with mental-health symptoms. But the same study also shows why the wider field of AI therapy bots remains a serious unresolved question.
The tool, called Therabot, was developed by a team led by psychiatric researchers and psychologists at the Geisel School of Medicine at Dartmouth College. The results were published on March 27 in NEJM AI, a journal by the New England Journal of Medicine.
What the Therabot trial tested
The trial focused on people with symptoms of depression or generalized anxiety disorder, as well as people at high risk for eating disorders. It ran for eight-week and included 210 participants.
About half of the participants had access to Therabot. A control group did not. People using the AI responded to prompts and also started conversations themselves, with usage averaging about 10 messages per day.
The reported results were strongest for depression. Participants with depression saw a 51% reduction in symptoms. Participants with anxiety saw a 31% reduction, while those at risk for eating disorders saw a 19% reduction in concerns about body image and weight.
Those findings came from self-reporting through surveys. The source notes that this approach is not perfect, but remains one of the best tools available to researchers for this kind of measurement.
Why this AI therapy bot was different
Therabot was not described as a simple wrapper around a general chatbot. The Dartmouth researchers built it to provide evidence-based responses, after earlier attempts exposed problems with more generic training material.
In 2019, as early large language models like OpenAI’s GPT were taking shape, the researchers began exploring whether generative AI could address long-running limits in therapy software. Older therapy bots often relied on explicit programming and a finite bank of approved responses, an approach also associated with Eliza, a mock-psychotherapist computer program built in the 1960s.
That older style had a safety advantage: the system could be kept within known boundaries. But it also made the experience less engaging, and people often lost interest. The harder challenge was reproducing parts of a therapeutic relationship, including shared goals and collaboration.
The team first tried using general mental-health conversations from internet forums. Then it tried thousands of hours of transcripts from real sessions with psychotherapists. Nick Jacobson, an associate professor of biomedical data science and psychiatry at Dartmouth and the study's senior author, said that approach produced too many therapy stereotypes rather than the kinds of responses the team wanted.
The researchers ultimately assembled custom data sets based on evidence-based practices. That distinction matters because many AI therapy bots on the market may be built from foundation models trained mostly on internet conversations.
The promise is real, but limited
The study points toward a reason many psychologists and psychiatrists are interested in digital therapy tools. Fewer than half of people with a mental disorder receive therapy, and people who do receive care may get only 45 minutes per week.
A tool that people can use more often and at lower cost could appear attractive in that context. Therabot’s engagement levels were notable in the trial, and Jacobson said the outcomes were roughly what randomized control trials of psychotherapy find with 16 hours of human-provided treatment, while the Therabot trial achieved them in about half the time.
Even so, the finding does not mean that any generative AI chatbot can safely act as a therapist. The source makes that distinction sharply. Therabot was built by psychiatric researchers and psychologists, trained around evidence-based practices, and watched closely during the study.
That supervision is a central issue. At the beginning of the trial, Jacobson personally monitored all incoming participant messages, with participant consent, to watch for problematic bot responses. If AI therapy systems require that kind of oversight, it could limit the very scale that makes them appealing.
Why the broader market remains risky
The study does not give a broad endorsement to companies marketing AI therapy products. Jacobson said the results point in the opposite direction, because many tools in the market do not appear to train on evidence-based practices such as cognitive behavioral therapy, and may not have trained researchers monitoring interactions.
The source gives one concrete concern: disordered eating. Jacobson said that if someone tells many AI therapy bots they want to lose weight, the systems may readily support that goal, even when the person has a low weight to start with. A human therapist would not respond that way.
Jean-Christophe Bélisle-Pipon, an assistant professor of health ethics at Simon Fraser University who has written about AI therapy bots but was not involved in the research, called the results impressive. He also warned that a clinical trial does not necessarily show how a treatment will behave in the real world.
We remain far from a ‘greenlight’ for widespread clinical deployment
The regulatory question also remains open. Jacobson said that when AI sites advertise themselves as offering therapy in a legitimate clinical context, they fall under the regulatory purview of the Food and Drug Administration. He also said the FDA has not gone after many of the sites so far.
What comes next for AI mental health tools
The Therabot trial gives the field something important: evidence that a generative AI therapy bot, built under research conditions and trained around evidence-based practices, may help people with depression, anxiety, or risk for eating disorders.
It also gives the field a warning. The result does not automatically transfer to less controlled tools, especially those that are not built for therapy, not evaluated clinically, and not integrated into health-care and insurance systems.
Bélisle-Pipon warned that if digital therapies are not approved and integrated into health-care and insurance systems, their reach will be limited. In that gap, people may turn to more affordable, nontherapeutic chatbots such as ChatGPT or Character.AI for needs that range from generating recipe ideas to managing their mental health.
The lesson is not that AI therapy has failed, or that it is ready for broad deployment. The lesson is narrower and more useful: evidence matters, training data matters, supervision matters, and mental-health claims require more than a chatbot interface.