TechCrunch AI November 22, 2024 NEUTRAL

Why OpenAI’s AI morality research faces a high bar

OpenAI Inc. disclosed a grant to Duke University researchers for work titled "Research AI Morality." The project aims to train algorithms to "predict human moral judgements," but earlier attempts show how fragile AI moral reasoning can be.

OpenAI is putting money behind academic work on a difficult question: whether algorithms can anticipate how humans make moral judgements. The effort centers on Duke University researchers and a project disclosed in an IRS filing as "Research AI Morality."

The work sits at the edge of a larger debate about what AI systems can and cannot understand. Predicting patterns in language is one thing. Handling moral conflicts in medicine, law, and business is a much harder test.

What OpenAI is funding

In a filing with the IRS, OpenAI Inc., OpenAI’s nonprofit org, reported that it awarded a grant to Duke University researchers for a project called "Research AI Morality." When asked about the grant, an OpenAI spokesperson pointed to a press release describing the award as part of a larger, three-year, $1 million grant to Duke professors studying "making moral AI."

Few details about the funded research are public. The source states that the grant ends in 2025. Walter Sinnott-Armstrong, a practical ethics professor at Duke and the study’s principal investigator, told TechCrunch by email that he "will not be able to talk" about the work.

The stated goal, according to the press release cited in the source, is to train algorithms to "predict human moral judgements" in scenarios where morally relevant features conflict in medicine, law, and business.

The Duke researchers’ prior work

Sinnott-Armstrong and Jana Borg, the project’s co-investigator, have already worked on related questions. Their past output includes studies and a book about whether AI might act as a "moral GPS" that helps people make better judgements.

As part of larger teams, they have also worked on a "morally-aligned" algorithm intended to help decide who receives kidney donations. They have studied situations in which people would prefer that AI make moral decisions.

That background helps explain why OpenAI would fund this line of inquiry. The research is not simply about building a chatbot that gives ethical advice. It is about whether algorithms can model human moral judgement in structured scenarios where different values may point in different directions.

Why morality is hard for AI

The central difficulty is that morality is not a simple prediction problem. Modern machine learning models are statistical systems. They are trained on large collections of examples from the web, then learn patterns that help them generate predictions.

The source gives a plain example: a model can learn that the phrase "to whom" is often followed by "it may concern." That kind of pattern recognition is useful, but it does not mean the system understands ethics, human reasoning, or the emotional context behind moral decisions.

This matters because moral judgement often depends on context. A model can produce an answer that appears reasonable in one version of a question, then change its response when the same dilemma is phrased differently. That weakness becomes serious when the topic is not grammar or style, but human values.

A cautionary example

The source points to Ask Delphi, a 2021 tool from the nonprofit Allen Institute for AI that was meant to provide ethically sound recommendations. It handled some simple moral questions as expected. For example, it recognized that cheating on an exam was wrong.

But the tool was brittle. Small changes in wording could lead it to approve of almost anything, including smothering infants. It also reflected bias: Delphi said that being straight is more "morally acceptable" than being gay.

That example illustrates why AI morality research is so difficult. If a system responds differently because a question is rephrased, it may be tracking surface patterns rather than a stable ethical framework. If it reflects bias from training data, it may exclude or misrepresent people whose values are less visible online.

The unresolved question

The source also notes that AI systems tend to mirror values common in Western, educated, and industrialized nations. That is partly because the web, and therefore much AI training data, is dominated by material expressing those viewpoints.

Many people’s values may not be represented in AI outputs, especially when those people are not contributing to training sets by posting online. The problem is broader than geography or culture. The source also notes that philosophers have debated ethical theories for thousands of years without producing one universally applicable framework.

Even current AI systems can appear to lean in different philosophical directions. The source says Claude favors Kantianism, meaning a focus on absolute moral rules, while ChatGPT leans every-so-slightly utilitarian, prioritizing the greatest good for the greatest number of people.

That does not settle which approach is better. It shows why an algorithm designed to predict human moral judgements would need to account for disagreement, bias, context, and subjectivity. OpenAI’s funded research is aimed at that challenge. Whether such an algorithm can be built at all remains an open question.