AI agents trained on interviews predict behavior at 85% accuracy

Researchers from Stanford, Washington University, and Google DeepMind built more than 1,000 AI agents from two-hour interview transcripts. In tests, the interview-based agents predicted General Social Survey responses with 85% accuracy and closely matched human results in four of five social science experiments.

WTF Index TERMINATOR
◄ Terminator 2 Idiocracy 1 ►

Interview-grounded agents that predict individual behavior raise mild concerns about surveillance and manipulation, though the story is mainly research-focused.

AI agents trained on interviews predict behavior at 85% accuracy

AI agents are moving from chat tools toward something more experimental: simulated participants that can stand in for people in social science research. A team from Stanford, Washington University, and Google DeepMind has built more than 1,000 such agents using detailed interviews with people selected to represent the US population across age, gender, education, and political views.

The result is not a claim that machines fully understand humans. It is a more specific finding: when AI agents are grounded in rich personal interview data, they can make stronger predictions about how those same people respond in surveys, personality assessments, and behavioral economics tasks.

How the AI agents were built

The researchers started with more than 1,000 human participants. Each person took part in a two-hour interview, creating the raw material for an individual AI agent.

Those conversations were converted to text with OpenAI's Whisper model. The transcripts were then combined with GPT-4o. When a researcher queries an agent, the system loads the relevant interview transcript into the model and instructs it to imitate the person based on their responses.

That design matters because it gives the model more than a basic profile. Instead of relying only on a few demographic labels, the agent has access to a fuller account of how the person talked about themselves, their views, and their answers during the interview.

The study positions these agents as a possible laboratory for testing theories in fields such as economics, sociology, organization, and political science. In that role, the agents are not treated as generic synthetic people. They are built around specific interview records from real participants.

Why interviews changed the results

The team compared interview-based agents with agents that used only basic demographic information. The difference was substantial.

On questions from the General Social Survey, the interview-based AI agents predicted human responses with 85% accuracy. The source reports that this was significantly better than the performance of demographic-only agents.

The researchers also tested the agents with Big Five personality assessments and multiple behavioral economics games. These tasks gave the team several ways to measure whether the agents could reproduce patterns in human behavior rather than simply answer a narrow set of survey questions.

The strongest lesson from the study is that personal context appears to matter. A model given only demographic categories has less to work with. A model given a two-hour transcript can draw from a broader set of signals when predicting how a person might answer.

Social experiments showed close matches

The researchers ran five social science experiments with both human participants and AI agents. In four out of these five studies, the agents produced results that closely matched the human responses.

The statistical comparison was also strong. The source reports a correlation coefficient of 0.98 between AI and human responses.

For social scientists, that kind of result points to a practical possibility: AI agents could help test hypotheses before or alongside human studies. The source describes the dataset as a potential testing ground for theories in economics, sociology, and political science.

That does not make the agents a replacement for human participants in every setting. The reported findings are tied to the particular method used here: two-hour interviews, transcripts, GPT-4o, and tests against known survey and experiment formats. The value is in how closely this setup reproduced certain measured responses, not in a blanket guarantee that every human decision can be predicted.

Bias and demographic performance

The interview-based method also improved performance across different groups. According to the source, the agents made more accurate predictions across different political ideologies and ethnic groups than methods based only on demographics.

The system also showed more balanced performance when analyzing responses between various demographic categories. That is important because a simulation tool that works well for one group but poorly for another would have limited research value.

Using interviews does not remove every concern. But the study suggests that richer individual data can reduce some of the weaknesses seen when agents are built from demographic labels alone.

  • Demographic-only agents rely on broad categories such as age, gender, education, or political views.
  • Interview-based agents use detailed transcripts from two-hour conversations with participants.
  • The reported advantage is stronger prediction and more balanced performance across demographic categories.

Access and privacy protections

The research team has made its dataset of 1,000 AI agents available to other scientists through GitHub. The access model has two tiers.

Scientists can freely access combined response data for specific tasks. Access to individual response data for open-ended research requires special permission.

That structure reflects a tension at the center of this work. The same detailed interviews that make the agents more useful also create privacy concerns for the people who supplied the data. The two-tier system is meant to support further research while maintaining protections for the original interview participants.

If the approach proves useful, it could give researchers a new way to explore social behavior before running larger or more expensive human studies. The immediate significance is narrower but still notable: with rich interview data, AI agents can reproduce several measured patterns of human response with a level of accuracy that demographic-only agents did not match.