AI-written Community Notes put X’s trust system to the test

X plans to let AI agents write Community Notes, with human reviewers rating the drafts before users see them. The system could make fact-checking faster, but X’s own research paper warns that persuasive, inaccurate notes could weaken trust if safeguards fail.

WTF Index IDIOCRACY
◄ Terminator 1 Idiocracy 3 ►

AI-written fact-checks could scale misinformation-like errors by rewarding persuasive polish over accuracy, weakening trust and truth quality.

AI-written Community Notes put X’s trust system to the test

X is preparing to add AI-written Community Notes to its fact-checking system, a move the platform presents as a way to make corrections faster and broader across the service. The idea is simple on the surface: AI agents draft notes, human reviewers rate them, and the feedback helps the agents improve over time.

But the same research paper that frames the change as an upgrade also lays out the central problem. If AI can write notes that sound balanced and convincing without being accurate, the system may reward polish over truth. For a product built around trust, that is not a small risk.

What X Wants AI Notes To Do

Community Notes have been one of X’s most visible attempts to crowdsource fact-checking. The system relies on people with different viewpoints to assess whether posts need additional context and whether proposed notes are useful.

X’s new plan would bring AI agents into that workflow. In the ideal version described in the research paper, the agents would help produce more notes in less time, especially on posts that users have flagged as needing context.

Each AI-written note would be rated by a human reviewer. That rating would then become part of a feedback loop, helping the AI agent write better notes as the process repeats. X’s researchers suggest this could allow people to spend more time on harder cases, including posts that require niche expertise or social awareness.

The larger ambition is not just faster moderation. The paper says the system could become “a blueprint for a new form of human-AI collaboration in the production of public knowledge.” That is the optimistic case: AI handles more routine drafting, while humans keep judgment and accountability in the loop.

The Accuracy Problem

The biggest unresolved question is whether AI-written notes will be as accurate as human-written ones. X’s paper directly acknowledges that uncertainty.

The risk is not only that an AI agent may be wrong. It is that it may be wrong in a way that is difficult to detect. The paper warns that AI systems can generate “persuasive but inaccurate notes,” and that human raters may still judge them as helpful because AI is “exceptionally skilled at crafting persuasive, emotionally resonant, and seemingly neutral notes.”

That matters because Community Notes depends on the relationship between helpfulness and accuracy. If a note looks useful but misleads readers, the feedback loop may train the system in the wrong direction.

“If rated helpfulness isn’t perfectly correlated with accuracy, then highly polished but misleading notes could be more likely to pass the approval threshold,” the paper said.

The paper also warns that this risk could increase as LLMs improve. More capable systems may be better at assembling a convincing body of evidence for almost any claim, even when the claim is not true. That could make deception or error harder for human raters to spot.

Why Critics Are Concerned

X is already facing criticism over the plan. On Tuesday, former United Kingdom technology minister Damian Collins accused X of building a system that could allow “the industrial manipulation of what people see and decide to trust” on a platform with more than 600 million users, The Guardian reported.

Collins also argued that AI notes could increase the promotion of “lies and conspiracy theories” on X. His concern reflects the broader fear around automated fact-checking: when a system operates at scale, even small weaknesses can become large public problems.

Samuel Stockwell, a research associate at the Centre for Emerging Technology and Security at the Alan Turing Institute, told The Guardian that the outcome will depend on “the quality of safeguards X puts in place against the risk that these AI ‘note writers’ could hallucinate and amplify misinformation in their outputs.”

“AI chatbots often struggle with nuance and context but are good at confidently providing answers that sound persuasive even when untrue,” Stockwell said.

Another complication is openness. X’s Community Notes account explained that anyone can create an AI agent using any technology to write Community Notes. That could mean some agents are more biased or defective than others, creating uneven quality inside a system that depends on public confidence.

The Human Review Bottleneck

X’s plan still depends heavily on human reviewers. That creates a practical problem: if AI agents produce many more drafts, people may have to rate many more notes.

The research paper recognizes this pressure. X is planning more research to ensure that “human rating capacity can sufficiently scale.” If it cannot, the platform says “the impact of the most genuinely critical notes” could be diluted.

Andy Dudfield, the head of AI at Full Fact, told The Guardian that X risks “increasing the already significant burden on human reviewers to check even more draft notes, opening the door to a worrying and plausible situation in which notes could be drafted, reviewed, and published entirely by AI without the careful consideration that human input provides.”

The paper discusses one possible response to that bottleneck: removing human review in some cases and applying AI-written notes in “similar contexts” where human raters had previously approved notes. But X also acknowledges the danger in that approach.

“Automatically matching notes to posts that people do not think need them could significantly undermine trust in the system,” X’s paper acknowledged.

What Happens Next

X says AI-written Community Notes “will be clearly marked for users.” At first, the notes will appear only on posts where people have requested a note, according to X’s Community Notes account. Later, AI note writers could be allowed to select posts for fact-checking themselves.

AI-written notes are expected to start appearing on X later this month. Users can already start testing AI note writers and may soon be considered for the initial cohort of AI agents.

For its research, X collaborated with post-graduate students, research affiliates, and professors studying areas including human trust in AI, fine-tuning AI, and AI safety at Harvard University, the Massachusetts Institute of Technology, Stanford University, and the University of Washington.

The researchers concluded that “under certain circumstances,” AI agents can “produce notes that are of similar quality to human-written notes—at a fraction of the time and effort.” They also said more research is needed before X can fully capture what the paper calls “a transformative opportunity” with “dramatically increased scale and speed.”

The future version imagined by researchers goes beyond drafting notes for posts flagged by people. AI agents could eventually help identify posts predicted to go viral and respond before misinformation spreads faster than human reviewers can handle. They could also help raters access research, synthesize evidence, compose clearer notes, and perhaps even predict rating scores.

But the core tradeoff remains unchanged. AI-written Community Notes could make X’s fact-checking faster and more expansive. They could also make the system easier to flood with persuasive errors. The test now is whether X can scale speed without weakening the trust that made Community Notes valuable in the first place.