TechCrunch AI December 5, 2024 TERMINATOR

Why Google’s PaliGemma 2 Emotion Claims Raise AI Concerns

Google says PaliGemma 2 can describe actions, emotions and scenes in images, although emotion recognition requires fine-tuning. Experts warned TechCrunch that the premise of reading emotions from faces is scientifically weak, potentially biased and risky if used in high-stakes settings.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 1 ►

Emotion recognition in vision models raises risks of biased, unreliable surveillance or high-stakes automated judgments.

Why Google’s PaliGemma 2 Emotion Claims Raise AI Concerns

Google’s PaliGemma 2 is meant to make image understanding more detailed. The concern is that one advertised capability moves beyond identifying objects and into a much more contested area: describing emotions.

The model family, announced on Thursday, can analyze images and produce captions or answer questions about people shown in photos. Google says it can describe actions, emotions and the wider narrative of a scene. That claim has drawn sharp warnings from researchers who study AI, bias and technology’s social impact.

What Google Says PaliGemma 2 Can Do

PaliGemma 2 is a family of AI models built for image analysis. According to Google, it can generate detailed, contextually relevant captions rather than only naming visible objects.

In a blog post shared with TechCrunch, Google wrote: “PaliGemma 2 generates detailed, contextually relevant captions for images,” and said the model goes beyond simple object identification to describe “actions, emotions, and the overall narrative of the scene.”

There is an important limitation in the source article: emotion recognition does not work out of the box. PaliGemma 2 has to be fine-tuned for that purpose. Even so, experts said the prospect of an openly available emotion detector is troubling because the same underlying capability could be adapted and deployed in many different contexts.

The issue is not only whether an AI system can label a face with an emotion word. The larger question is whether that label has any reliable meaning, especially when applied across different people, backgrounds and situations.

Why Emotion Recognition Is Disputed

For years, companies have tried to build systems that infer emotions for uses ranging from sales training to accident prevention. Some developers claim progress, but the scientific foundation remains contested.

Many emotion detection systems trace their assumptions to early work by Paul Ekman, a psychologist who theorized that humans share six fundamental emotions: anger, surprise, disgust, enjoyment, fear, and sadness. Later studies challenged that idea, showing major differences in how people from different backgrounds express what they feel.

Sandra Wachter, a professor in data ethics and AI at the Oxford Internet Institute, told TechCrunch: “This is very troubling to me.” She added: “I find it problematic to assume that we can ‘read’ people’s emotions. It’s like asking a Magic 8 Ball for advice.”

Mike Cook, a research fellow at King’s College London specializing in AI, made a similar point. “Emotion detection isn’t possible in the general case, because people experience emotion in complex ways,” he told TechCrunch. He said it may be possible to detect “some generic signifiers in some cases,” but not to fully solve the problem.

That distinction matters. A smile, a stare or a facial movement may be visible to a camera, but the meaning of that expression can depend on context that an image model may not have. A model may see a face and generate a confident label, while the person’s actual experience remains unknown.

Bias and Testing Questions

The source article describes a recurring problem with emotion-detecting systems: they can be unreliable and shaped by the assumptions of their designers. That can turn subjective interpretations into automated outputs that appear more certain than they are.

A 2020 MIT study found that face-analyzing models could develop unintended preferences for certain expressions, including smiling. More recent work suggests that emotional analysis models assign more negative emotions to Black people’s faces than white people’s faces.

Google says it conducted “extensive testing” to evaluate demographic biases in PaliGemma 2. The company also said it found “low levels of toxicity and profanity” compared with industry benchmarks.

But TechCrunch reported that Google did not provide the full list of benchmarks it used or specify which types of tests were performed. The only disclosed benchmark was FairFace, a dataset made up of tens of thousands of people’s headshots. Google says PaliGemma 2 scored well on FairFace, but some researchers have criticized FairFace as a bias metric because it represents only a handful of race groups.

Heidy Khlaaf, chief AI scientist at the AI Now Institute, said the core problem goes beyond technical benchmarking. “Interpreting emotions is quite a subjective matter that extends beyond use of visual aids and is heavily embedded within a personal and cultural context,” she said. “AI aside, research has shown that we cannot infer emotions from facial features alone.”

Where Misuse Could Matter Most

The strongest concerns focus on where emotion recognition might be used. TechCrunch noted that regulators overseas have already sought to limit the technology in high-risk contexts. The AI Act, the major piece of AI legislation in the EU, prohibits schools and employers from deploying emotion detectors, though not law enforcement agencies.

Open models raise a separate concern because they can be accessed and adapted through multiple hosts. PaliGemma 2 is available from a number of hosts, including AI dev platform Hugging Face. Experts worry that public availability can make misuse harder to control once a capability is released.

Khlaaf warned that if emotional identification rests on pseudoscientific assumptions, it could be used to “further — and falsely — discriminate against marginalized groups” in areas including law enforcement, human resourcing and border governance.

Asked about the risks of publicly releasing PaliGemma 2, a Google spokesperson said the company stands behind its tests for “representational harms” connected to visual question answering and captioning. The spokesperson added: “We conducted robust evaluations of PaliGemma 2 models concerning ethics and safety, including child safety, content safety,”

Wachter remained unconvinced that testing alone resolves the issue. “Responsible innovation means that you think about the consequences from the first day you step into your lab and continue to do so throughout the life cycle of a product,” she said.

Her warning points to the central tension around PaliGemma 2 and emotion recognition. Image models can make captions richer and more useful, but when they move into claims about inner states, the stakes change. A label that looks like a neutral description can become a judgment about a person, and in the wrong setting that judgment can carry real consequences.