The Decoder March 27, 2026 IDIOCRACY

Why Meta Community Notes may struggle with AI disinformation

Meta's Oversight Board says Community Notes has serious limits as a replacement for professional fact-checking. Its concerns center on slow publication, low coverage, AI-enabled manipulation, crisis risks and possible harm to minorities.

WTF Index IDIOCRACY

◄ Terminator 2 Idiocracy 4 ►

The story centers on AI-generated disinformation overwhelming weak fact-checking systems and eroding public truth and information quality.

Why Meta Community Notes may struggle with AI disinformation

Meta's plan to expand Community Notes globally is facing a direct warning from its own Oversight Board: the system may not be strong enough for the scale and speed of modern misinformation, especially when AI-generated disinformation is involved.

The Board's analysis does not reject the idea that users can add helpful context to public posts. Its concern is narrower and more serious: a system that depends on delayed consensus, limited participation and fragile safeguards may fail in exactly the situations where harmful claims move fastest.

What the Board says is wrong with Community Notes

Community Notes replaced Meta's professional fact-checking program in the United States after Meta announced the change at the start of U.S. President Donald Trump's second term. The former program had been running for roughly a decade.

The Oversight Board describes major weaknesses in the new approach. In its analysis, it warns that delays, low publication volume and dependence on the wider information environment raise doubts about whether Community Notes can address misinformation linked to harm.

"Delays in note publication, the limited number of published notes and its dependence on the broader information environment's reliability raise serious doubts about the extent to which community notes can meaningfully address misinformation linked to harm,"

The gap between Community Notes and professional fact-checking is stark in the figures cited. According to Meta, just around 900 Community Notes were published in the first six months of the U.S. rollout. Over the same period in the EU, professional fact-checkers enabled Meta to apply labels to approximately 35 million Facebook posts, according to Angie Drobnic Holan, director of the International Fact-Checking Network.

That contrast matters because misinformation can spread quickly. If a note arrives late, or never appears at all, the corrective context may miss the moment when a misleading post is most visible.

Why most notes never reach users

Meta's Community Notes system is built on the open-source algorithm from X, formerly Twitter. Users propose contextual notes on public posts. Other users then rate those notes as "helpful" or "not helpful." A note becomes public only when a bridging algorithm decides that users who usually disagree with each other have found it helpful.

In theory, that design is meant to avoid one-sided moderation. In practice, most proposed notes do not clear the threshold.

According to a September 2025 update from Meta, only about six percent of all proposed notes are ever published. On X, one study puts the rate at 8.3 percent, with an average delay of 26 hours until publication, "well past the point of peak visibility for most misleading posts." Another analysis puts the average delay at 65.7 hours.

The source also cites a broader finding from X: between January 2021 and January 2025, 87.7 percent of all proposed notes stayed in the "Needs More Ratings" category and were never published.

Community Notes also differs sharply from the earlier professional fact-checking program in its consequences. According to the Board, content that receives a Community Note is neither downranked nor excluded from recommendations. There are "no strikes for posting content that receives a community note," and no effects on reach or monetization. Under the prior program, content rated false or misleading could be demoted in distribution and rejected for ads.

AI makes the system easier to pressure

The Oversight Board points to AI as a growing risk for Community Notes. AI-powered tools can support the scaled creation and management of accounts and networks, which could be used to manipulate the system.

"This risk will only become more acute as artificial intelligence facilitates the scaled creation and operation of accounts and networks,"

The Board also warns that AI-powered contributors could be used in subtler ways. Its analysis says malicious actors could fine-tune models to favor narratives, frame evidence selectively or exploit the rating mechanism while appearing neutral.

Research on X's Community Notes adds another concern: "a small minority (5-20%) of bad raters can strategically suppress targeted helpful notes." The Board also notes that published notes do not "lock" until two weeks after consensus is reached. During that period, coordinated negative ratings could remove a note.

Meta told the Board that it "does not plan to allow AI note writers (i.e., AI-powered chatbots or agents) to submit community notes on Meta's platforms. Contributors may use AI to help them write notes; however, a human must submit the note under their name." Meta also said that, to date, it "has not detected any coordinated inauthentic behavior or gaming of the program."

The Board's concern is that the adequacy of Meta's safeguards "is not clear from the information provided to the Board." In other words, the absence of detected abuse is not the same as proof that the system can withstand scaled AI-enabled manipulation.

Crisis situations expose the biggest gaps

The Board is especially concerned about countries experiencing crises or protracted conflict. It says Community Notes should not be introduced in those settings, where thresholds for incitement to violence are lower and notes targeting specific groups "can more easily result in offline harm."

The source points to an investigation of the Southport riots in the UK in 2024. Five accounts pushing false information amassed over 430 million views. Of the 1,060 posts shared by these accounts during the height of the riots, only one received a community note.

The Board also raises Meta's past failures to moderate hateful content in Myanmar and Ethiopia, linked in the source to the genocide of minority groups. In 2018, Facebook apologized for its role in "offline violence" in Myanmar.

A key operational problem remains unresolved. According to the Board, Meta "has not developed provisions regarding the use of the product in crisis situations, including adapting, modifying, or suspending the feature."

Minorities and languages may face unequal risks

The Board also identifies a structural weakness in the algorithm. It says the system models societal polarization along a single axis. Meta, the Board says, "has not provided any information that suggests its program will be substantively different" from X's.

That matters in countries where division is not simple. Political, ethnic, religious and linguistic conflicts may overlap. In that environment, a bridging algorithm could mistake shared prejudice among dominant groups for balanced consensus.

The Board describes a scenario in which majority groups that disagree on other issues share hostility toward a minority. That shared prejudice could become the "bridge" that allows harmful notes targeting minorities to pass the consensus threshold.

A consortium of South Asian NGOs presented the Board with evidence of such dynamics in X's Community Notes in India, where political divisions involve overlapping affiliations spanning ethnicity, religion, language and caste.

Language access is another limitation. The system currently operates in only six languages: English, Spanish, Chinese, Vietnamese, French, and Portuguese. Research cited in the source shows that notes in non-English languages on X are rated and published far less frequently.

The Board's recommendation is not simply to slow down. It calls for a staggered rollout with strict exclusion criteria. For Meta, the central question is whether Community Notes can be more than a low-friction context layer when misinformation is fast, coordinated and potentially harmful offline.