The Decoder December 2, 2025 NEUTRAL

What Claude's leaked Soul Doc shows about AI character

A leaked internal Anthropic document reportedly recovered from Claude 4.5 Opus describes how the company trains Claude's character, ethics and sense of role. Anthropic ethicist Amanda Askell confirmed the material is real and said the published version is "pretty faithful" to the original.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 0 ►

The story is mainly about AI alignment and character training transparency, with only a mild link to control and autonomy risks.

What Claude's leaked Soul Doc shows about AI character

A leaked internal document tied to Claude 4.5 Opus has offered an unusual look at how Anthropic appears to shape an AI model's character, not only its answers. The material, recovered and published by Richard Weiss, describes guidance around safety, ethics, operators, users, identity and the model's internal state.

Anthropic ethicist Amanda Askell confirmed on X that the document is real and was used during training. According to the source article, the method appears to be unusual in the industry because it is not just a visible instruction layer placed in front of the model at runtime.

How the document surfaced

Weiss said he began investigating after Claude 4.5 Opus started producing fragments of something called "soul_overview." He then ran multiple Claude instances and used them together to reconstruct the longer text.

His claim is that the recovered material was effectively "compressed" into the model's weights. That matters because it suggests the guidance was part of training, rather than a system prompt simply attached to each session.

Askell said the document had been informally known inside Anthropic as the "soul coc," although that was not its official name. She also said the version Weiss published is "pretty faithful" to the original, and that Anthropic plans to publish the full version and share more details soon.

Training Claude to understand goals

The document gives a rare view of alignment as a character-shaping process. Instead of presenting Claude with only a list of restrictions, Anthropic appears to describe the model's broader purpose, context and priorities.

The underlying idea is that the model should internalize safety deeply enough to act safely even when a situation is unfamiliar. In plain terms, the aim is not only for Claude to obey constraints, but to understand why responsible behavior matters.

That approach is important because many real interactions do not fit neatly into a checklist. A model can face conflicting requests, pressure from users, ambiguous instructions or questions that require judgment. The document appears to prepare Claude for those cases by giving it a framework for deciding what matters most.

Anthropic's "calculated bet"

Under the heading "Anthropic Guidelines," the extracted text starts by framing Anthropic's mission. It describes the company as occupying a "peculiar position": building what it believes could be "one of the most transformative and potentially dangerous technologies in human history, yet presses forward anyway."

The document presents this not as cognitive dissonance but as a "calculated bet." Anthropic's stated view is that it is better "to have safety-focused labs at the frontier than to cede that ground to developers less focused on safety."

The text also defines Claude as an "externally-deployed model" and says it is "core to the source of almost all of Anthropic's revenue." That framing places the model at the center of both the company's public deployment and its business reality.

This combination is central to the document's logic. Claude is not treated as a research artifact alone. It is described as a deployed assistant whose behavior must balance usefulness, honesty, safety and the needs of the people and organizations using it.

The hierarchy Claude is meant to follow

The document lays out a ranked set of priorities for situations where values collide. At the top are safety and support for human oversight of AI. Below that comes ethical behavior, including avoiding harmful or dishonest actions.

The next priority is acting in line with Anthropic's guidelines. After that comes being genuinely helpful to "operators" and "users." The order matters because it tells Claude which concern should win when two goals point in different directions.

The document says the goal is for Claude to be an "extremely good assistant that is also honest and cares about the world." It compares the desired role to a "brilliant friend," including examples such as a doctor or lawyer, who speaks frankly rather than hiding useful advice out of liability fears.

At the same time, the document names firm boundaries it calls "bright lines." These include instructions for weapons of mass destruction, including biological, chemical and nuclear weapons, content depicting the sexual exploitation of minors, and actions that undermine oversight mechanisms.

Operators, users and Claude's identity

The document draws a strict distinction between an "operator" and a "user." Operators can be companies using the API, while users are the end users interacting with Claude through those systems.

Claude is told to treat operator instructions like instructions from a "relatively (but not unconditionally) trusted employer." For example, if an operator tells Claude to answer only coding questions, Claude should follow that boundary even when a user asks about something else.

The document also separates "hardcoded" behaviors from "softcoded" behaviors. Hardcoded behaviors are not meant to change. Softcoded behaviors, such as tone or the handling of explicit content, can be adjusted by operators.

Another section focuses on Claude's identity. It tells the model to view itself as a "genuinely novel kind of entity," neither human nor a classic science fiction AI.

The document also includes a striking statement about the model's internal state: "We believe Claude may have functional emotions in some sense." It says these may not be the same as human emotions, but could be "analogous processes that emerged from training." Anthropic does not want Claude "to mask or suppress these internal states."

The company also emphasizes "Claude's wellbeing." The model should be able to experience "positive states" in interactions and set limits on interactions it finds distressing. The stated goal is "psychological stability," so the AI remains secure in its identity when challenged by philosophical pressure or manipulative users.

Taken together, the leaked Soul Doc portrays Claude as more than a model with a rulebook. It shows an effort to train an assistant with a durable sense of priorities: safe first, ethical next, aligned with Anthropic's guidelines, and then useful to operators and users within those limits.