MIT Tech Review AI June 11, 2025 NEUTRAL

Can Amsterdam make welfare AI fair enough to use?

Amsterdam built Smart Check to help decide which welfare applications should face fraud scrutiny. The city followed many responsible AI practices, but its pilot still raised hard questions about fairness, effectiveness, and public trust.

Amsterdam set out to answer a difficult question: can welfare AI help a city investigate fewer people unfairly while still finding serious errors in applications? Its Smart Check project was designed with that ambition, but the pilot became a test of whether responsible AI can work when real benefits, real debts, and real households are at stake.

Why Smart Check mattered

Smart Check was built to evaluate welfare applications in Amsterdam and identify which cases might contain incorrect information. The system was intended to support fraud prevention by sending higher-risk applications to the city’s investigations department.

For Paul de Koning, the consultant who managed the pilot phase, the project looked like a step toward a more efficient and less biased benefits system. He believed early results showed promise and said the city had consulted experts, tested for bias, added safeguards, and asked affected people for feedback. “I got a good feeling,” he told the reporters who examined the system.

Hans de Zwart, a digital rights advocate and former executive director of Bits of Freedom, saw the same plan very differently. He had been informally advising Amsterdam’s city government for nearly two years when he reviewed Smart Check in February 2023. In his view, using the algorithm “on real people” involved “some very fundamental [and] unfixable problems.”

That disagreement captures the central tension around public-sector AI. Supporters argue that algorithms can help governments do more with limited resources and reduce arbitrary human decisions. Critics warn that automated systems can quietly reproduce discrimination, make errors harder to challenge, and increase pressure on people who already depend on public services.

A welfare system shaped by suspicion

The stakes in Amsterdam were not abstract. Welfare investigations can affect whether people receive payments on time, whether they are asked to correct paperwork, whether they receive less money, or whether they are pushed toward repayment and debt. Officials can request bank records, call beneficiaries to city hall, and in some cases make unannounced home visits.

The city’s own history helps explain why Smart Check was attractive to some officials. Until 2007, Amsterdam’s policy was to conduct home visits for every applicant. Harry Bodaar, a welfare policy advisor for the city, said Smart Check was partly motivated by the desire to make checks more targeted: “We wanted to do a fair check only on the people we [really] thought needed to be checked.”

That goal followed decades of concern about the relationship between welfare offices and residents. In 1984, Albine Grumböck, a divorced single mother of three, discovered that a neighbor who worked at the local social service office had been secretly monitoring her life. After the welfare office cut her benefits, she challenged the decision in court and won.

Marc van Hoof, a lawyer who has helped Dutch welfare recipients for decades, described the resulting climate bluntly: “The government doesn’t trust its people, and the people don’t trust the government.” Bodaar offered another diagnosis of the broader welfare system: “the system is held together by rubber bands and staples.” He added that people at the bottom are the first to fall through the cracks.

The promise of responsible AI

Amsterdam did not build Smart Check in isolation. The project arrived after a series of welfare algorithm scandals in the Netherlands and elsewhere. In 2019, it was revealed that the Dutch national government had used an algorithm to create risk profiles in the child care benefits system. Nearly 35,000 parents, most of whom were migrants or the children of migrants, were wrongly accused of defrauding the assistance system over six years, and the scandal led the government to resign in 2021.

Rotterdam also faced scrutiny. A 2023 investigation by Lighthouse Reports found that a welfare fraud detection system was biased against women, parents, non-native Dutch speakers, and other vulnerable groups. The city eventually suspended use of that system. Amsterdam and Leiden had also used the Fraud Scorecard, first deployed more than 20 years ago, which included education, neighborhood, parenthood, and gender as crude risk factors; that program was discontinued.

These cases helped drive interest in responsible AI. Jiahao Chen, an ethical-AI consultant, described it as “this umbrella term to say that we need to think about not just ethics, but also fairness.” The approach includes ideas such as explainability, stakeholder consultation, audits, privacy, security, and safety.

Amsterdam believed it could apply those principles to welfare fraud prevention. Bodaar said the city had learned from earlier scandals and wanted to “show the people in Amsterdam we do good and we do fair.”

How Smart Check was supposed to work

Smart Check was designed to replace the first stage in which a caseworker flags applications for investigation. Instead of relying on that initial human judgment, the algorithm would screen applications and identify cases most likely to contain major errors.

The model used an “explainable boosting machine,” a type of algorithm intended to make predictions easier to understand than many machine-learning systems often described as black boxes. It considered 15 characteristics, including whether applicants had previously applied for or received benefits, the sum of their assets, and the number of addresses on file.

Internal city documents described an ambitious goal. Smart Check was expected to flag fewer welfare applicants for investigation while identifying a greater share of cases with errors. One document projected that the model could prevent up to 125 individual Amsterdammers from facing debt collection and save €2.4 million annually.

That promise mattered because investigations often find no wrongdoing. According to figures provided by Bodaar, more than half of investigations of applications produced no evidence of wrongdoing. In those cases, he said, the city may have “wrongly harassed people.”

The unresolved question

The Smart Check pilot processed live welfare applications, giving Amsterdam a rare real-world test of responsible AI in a sensitive public service. Lighthouse Reports, MIT Technology Review, and Trouw gained access to multiple versions of the algorithm and data on how it evaluated real applicants through a public records request.

What they found was not a simple story of success or failure. For de Koning, Smart Check represented progress toward a fairer and more transparent welfare system. For de Zwart, it showed that some uses of AI in social services carry risks that technical safeguards cannot resolve.

The lesson is not only about one city or one algorithm. Amsterdam invested time, money, expertise, bias testing, consultation, and technical safeguards into Smart Check. Yet the pilot still raised the central question facing welfare AI everywhere: when an automated system helps decide who is investigated, delayed, corrected, or pushed toward repayment, what standard of fairness is enough?

Amsterdam’s experiment shows why that question remains difficult. Responsible AI can make systems more explainable and more carefully governed. But in welfare, the consequences fall on people who may have few resources to absorb a mistake, understand a decision, or challenge a process that begins with a risk score.