MIT Tech Review AI November 20, 2024 NEUTRAL

A two-hour interview can create an AI personality replica

Researchers including Stanford and Google DeepMind created AI replicas from two-hour spoken interviews with 1,000 people. The resulting simulation agents matched participants on tests and surveys with 85% similarity, but the work also raises concerns about consent, impersonation, and the limits of measuring personality.

A new research paper suggests that an AI model can build a convincing replica of a person’s values and preferences after a two-hour spoken interview. The work, from a team including researchers from Stanford and Google DeepMind, has been published on arXiv and has not yet been peer-reviewed.

The study points to a future in which AI personality replicas could be used for research, simulation, and digital twin products. It also shows why this area is difficult: the same tools that can help researchers model human behavior could also make it easier to impersonate real people online.

How the personality replicas were made

The research team, led by Joon Sung Park, a Stanford PhD student in computer science, recruited 1,000 people. The participants varied by age, gender, race, region, education, and political ideology, and they were paid up to $100 for taking part.

Each participant took part in a spoken two-hour interview. The conversation covered personal history and views, including childhood, formative memories, career, and thoughts on immigration policy. From those interviews, the researchers created AI agent replicas of the people who had participated.

In the paper, these replicas are called simulation agents. The idea is not simply to create chatbots that sound like people. The goal is to produce agents that can reflect a person’s values and preferences well enough to be useful in controlled studies.

Park framed the long-term idea this way:

“If you can have a bunch of small ‘yous’ running around and actually making the decisions that you would have made—that, I think, is ultimately the future,” Park says.

What the agents got right, and where they struggled

To test the replicas, the researchers compared the AI agents with the real participants across several exercises. The participants completed personality tests, social surveys, and logic games twice each, with the sessions two weeks apart. The agents then completed the same exercises.

The results were 85% similar. That figure suggests that the interview-based method captured enough information to reproduce many patterns in how participants answered or behaved across the chosen tests.

But the evaluation had limits. The tests included the General Social Survey, which collects information about demographics, happiness, behaviors, and more. They also included assessments of the Big Five personality traits: openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism.

Those tools are common in social science research, but they do not capture everything that makes a person distinct. The agents also performed worse on behavioral tests such as the “dictator game,” which is designed to show how people weigh values such as fairness.

That distinction matters. An AI personality replica may match a person on structured surveys while still missing important parts of how that person behaves in situations involving judgment, context, or moral tradeoffs.

Why researchers want simulation agents

The main purpose described in the paper is research. Simulation agents could make it easier to study questions that would otherwise be expensive, impractical, or unethical to test with real people.

If AI models can behave like real people in useful ways, researchers could explore questions such as how social media interventions might combat misinformation or what behaviors contribute to traffic jams. The agents would not replace real people in every study, but they could create a new way to run simulations before involving human participants.

John Horton, an associate professor of information technologies at the MIT Sloan School of Management, described the approach as a hybrid. In an email to MIT Technology Review, he said:

“This paper is showing how you can do a kind of hybrid: use real humans to generate personas which can then be used programmatically/in-simulation in ways you could not with real humans,”

The research also connects to the broader movement toward AI agents. Many leading AI companies are focused on tool-based agents, which are built to perform tasks such as entering data, retrieving stored information, or potentially booking travel and scheduling appointments. Salesforce announced its own tool-based agents in September, Anthropic followed in October, and OpenAI is planning to release some in January, according to Bloomberg.

Simulation agents are different because they aim to model people rather than complete chores for them. Even so, Horton says research on simulation agents is likely to lead to stronger AI agents overall.

Why interviews may matter more than surveys

A central question in this research is how to turn a person’s individual experience into information that a language model can use. The team chose qualitative interviews as the main method.

Park said he became convinced of the value of interviews after appearing on many podcasts following a 2023 paper he wrote on generative agents. He explained:

“I would go on maybe a two-hour podcast podcast interview, and after the interview, I felt like, wow, people know a lot about me now,” he says. “Two hours can be very powerful.”

Interviews can surface details that are hard to collect through standard survey questions. Park gave the example of someone who had cancer and was finally cured last year. That kind of personal information could shape how a person thinks and behaves, but it may not appear in a typical questionnaire.

Other companies are exploring related digital twin ideas through different data sources. Tavus, for example, can have AI models ingest customer emails or other data. Tavus CEO Hassaan Raza said that approach often requires a fairly large data set, while this paper suggests a more efficient path.

Raza said the company will experiment with the interview approach, adding:

“How about you just talk to an AI interviewer for 30 minutes today, 30 minutes tomorrow? And then we use that to construct this digital twin of you.”

The deepfake problem for personality

The clearest risk is impersonation. Image generation has already made it easier to create harmful deepfakes without consent. AI agent generation raises a similar concern for personality: someone could build a tool that personifies another person online, making it appear that the person said or authorized something they did not intend.

That risk is especially important because the research shows how little input may be needed. A two-hour interview is far less data than many people might expect for creating a useful replica of values and preferences.

The study does not settle what safeguards should exist, and it has not yet been peer-reviewed. But it does make the stakes clearer. AI personality replicas may become useful research instruments, but they also force a direct question about consent: who should be allowed to create a digital version of a person, and for what purpose?