MIT Tech Review AI July 22, 2024 TERMINATOR

One Year In, AI Self-Regulation Shows Progress and Gaps

One year after seven AI companies made voluntary commitments with the White House, the clearest progress is in red-teaming, cybersecurity, bug bounties, and watermarking. The harder questions remain transparency, accountability, independent access, and whether the measures companies report are actually reducing risk.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

The story focuses on safety testing, cybersecurity, deepfakes, and weak accountability around powerful AI systems, but it is mostly about governance progress and gaps rather than an acute threat.

One Year In, AI Self-Regulation Shows Progress and Gaps

One year after seven leading AI companies made voluntary AI commitments with the White House, the picture is mixed. The companies have reported new testing practices, security measures, vulnerability programs, and watermarking tools, but the evidence of broader transparency and accountability remains limited.

The commitments were signed on July 21, 2023, by Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI. They arrived during a period of intense generative AI competition, rising concern about copyright and deepfakes, and growing pressure on regulators to act.

What the White House commitments tried to do

The voluntary commitments asked companies to develop AI in a safer and more trustworthy way. The promises included improving testing and transparency, sharing information about possible harms and risks, investing in cybersecurity, enabling vulnerability reporting, and helping users identify AI-generated content.

Because the commitments are voluntary and unenforceable, they also reveal the limits of the US approach. The White House has since issued an executive order that expands on the commitments and applies to other tech companies and government departments, but the article makes clear that comprehensive federal legislation is still absent.

Robyn Patterson, a spokesperson for the White House, said, “We’re grateful for the progress leading companies have made toward fulfilling their voluntary commitments in addition to what is required by the executive order.” Patterson also said the president continues to call on Congress to pass bipartisan legislation on AI.

Testing and red-teaming improved, but details remain thin

The most visible progress is around red-teaming, where people probe AI models for flaws, unsafe behavior, and other risks. All the companies except Inflection, which declined to comment, said they conduct red-teaming with internal and external testers.

OpenAI said it has a preparedness team that tests models for cybersecurity, chemical, biological, radiological, and nuclear threats, as well as cases where a sophisticated AI model can do or persuade a person to do things that might lead to harm. Anthropic and OpenAI said they work with external experts before launching new models.

Anthropic said that for Claude 3.5, it conducted predeployment testing with experts at the UK’s AI Safety Institute. It also allowed METR to do an “initial exploration” of Claude 3.5’s capabilities for autonomy. Google said it red-teams Gemini around election-related content, societal risks, and national security concerns.

Microsoft said it worked with NewsGuard to evaluate risks and reduce abusive deepfakes in its text-to-image tool. Meta said it evaluated Llama 3 across risk areas including weapons, cyberattacks, and child exploitation.

Still, Rishi Bommasani of the Stanford Center for Research on Foundation Models said reporting activity is not enough. He argued that companies need to show whether their interventions actually reduce the risks they identify.

Information sharing is happening, but impact is unclear

The companies also committed to sharing information with industry, governments, civil society, and academia. After signing the commitments, Anthropic, Google, Microsoft, and OpenAI founded the Frontier Model Forum, a nonprofit focused on AI safety and responsibility. Amazon and Meta later joined.

All seven signatories are part of the Artificial Intelligence Safety Institute Consortium, established by the National Institute of Standards and Technology. Google, Microsoft, and OpenAI also have representatives at the UN’s High-Level Advisory Body on Artificial Intelligence.

Several companies pointed to research collaborations. Google cited work with MLCommons on a cross-industry AI Safety Benchmark and said it contributes resources such as computing credit to the National Science Foundation’s National AI Research Resource pilot. Meta pointed to its role in the AI Alliance and its engagement with open source AI and the developer community.

But the central question is whether these forums and partnerships produce meaningful change or mainly demonstrate activity. Bommasani said the Frontier Model Forum could help competitors cooperate on safety information, even if they are not fully transparent to the public.

Cybersecurity and vulnerability reporting moved forward

Several companies described new cybersecurity steps to protect model weights, which the commitments identify as essential parts of AI systems. Microsoft launched the Secure Future Initiative, said its model weights are encrypted, and described strong identity and access controls for highly capable proprietary models. Google launched an AI Cyber Defense Initiative.

OpenAI shared six new measures it is developing to complement existing cybersecurity practices, including extending cryptographic protection to AI hardware. It also has a Cybersecurity Grant Program that gives researchers access to its models to build cyber defenses.

Amazon said it has taken measures against generative AI-specific attacks such as data poisoning and prompt injection. Anthropic published details about protections including access controls for models and sensitive assets, third-party supply chain controls, and work with independent assessors.

For vulnerability reporting, Anthropic, Google, Microsoft, Meta, and OpenAI have bug bounty programs for AI systems. Anthropic and Amazon also said they provide website forms for security researchers to submit vulnerability reports.

Brandie Nonnecke of the CITRIS Policy Lab at UC Berkeley warned that third-party auditing is a difficult socio-technical challenge and may take years to mature. She also raised concern that early audits could set weak precedents if they define some risks while overlooking others.

Watermarking is the clearest technical fix

Many of the companies have built watermarking tools for AI-generated content. Google launched SynthID for image, audio, text, and video generated by Gemini. Meta has Stable Signature for images and AudioSeal for AI-generated speech. Amazon adds an invisible watermark to all images generated by its Titan Image Generator.

OpenAI uses watermarks in Voice Engine and has built an image-detection classifier for images generated by DALL-E 3. Anthropic was the exception because Claude does not support images, and watermarks are mainly used in images.

Several companies are also tied to the Coalition for Content Provenance and Authenticity, known as C2PA, which embeds information about whether content was created or edited by AI into image metadata. Microsoft and OpenAI automatically attach C2PA provenance metadata to images generated with DALL-E 3 and videos generated with Sora. Meta is not a member but said it is using the C2PA standard to identify AI-generated images on its platforms.

Bommasani described watermarking as part of the companies’ “natural preference to more technical approaches to addressing risk.” The open question is whether that technical fix meaningfully addresses the social concern behind it: helping people know when content is machine generated.

The overall result is progress with important limits. Red-teaming, cybersecurity, bug bounties, and watermarking have advanced. But voluntary AI commitments still leave companies largely responsible for judging their own performance, and the article’s experts argue that stronger transparency, independent scrutiny, and accountability remain unresolved.