Anthropic launches fund to measure capabilities of AI models

AI research is hurtling forward, but our ability to assess its capabilities and potential risks appears to be lagging behind. To bridge this critical gap, and recognize the current limitations in third-party evaluation ecosystems, Anthropic has started an initiative to invest in the development of robust, safety-relevant benchmarks to assess advanced AI capabilities and risks.

“A robust, third-party evaluation ecosystem is essential for assessing AI capabilities and risks, but the current evaluations landscape is limited,” Anthropic said in a blog post. “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply. To address this, today we’re introducing a new initiative to fund evaluations developed by third-party organizations that can effectively measure advanced capabilities in AI models.”

Anthropic differentiates itself from other AI peers by showcasing itself as a responsible and safety-first AI firm.

The company has invited interested parties to submit proposals through their application form, particularly those addressing the high-priority focus areas.

Anthropic’s initiative comes at a crucial time when the demand for high-quality AI evaluations is rapidly outpacing supply. The company aims to fund third-party organizations to develop new evaluations that can effectively measure advanced AI capabilities, thus elevating the entire field of AI safety.

“We’re seeking evaluations that help us measure the AI Safety Levels (ASLs) defined in our Responsible Scaling Policy,” the announcement continued. “These levels determine the safety and security requirements for models with specific capabilities.”

The initiative will prioritize three main areas: AI safety level assessments, advanced capability and safety metrics, and infrastructure for developing evaluations. Each area addresses specific challenges and opportunities within the AI field.

Prioritizing safety assessments

The AI Safety Level assessments will include cybersecurity, chemical, biological, radiological, and nuclear (CBRN) risks, model autonomy, and other national security risks. Evaluations will measure the AI Safety Levels defined in Anthropic’s Responsible Scaling Policy, ensuring models are developed and deployed responsibly.

“Robust ASL evaluations are crucial for ensuring we develop and deploy our models responsibly,” Anthropic emphasized. “Effective evaluations in this domain might resemble novel Capture The Flag (CTF) challenges without publicly available solutions. Current evaluations often fall short, being either too simplistic or having solutions readily accessible online.”

The company has also invited solutions to address critical issues such as national security threats potentially posed by AI systems.

“AI systems have the potential to significantly impact national security, defense, and intelligence operations of both state and non-state actors,” the announcement added. “We’re committed to developing an early warning system to identify and assess these complex emerging risks.”

Beyond Safety: Measuring Advanced Capabilities

Beyond safety, the fund aims to develop benchmarks that assess the full spectrum of a data model’s abilities and potential risks. This includes evaluations for scientific research, where Anthropic envisions models capable of tackling complex tasks like designing new experiments or troubleshooting protocols.

“Infrastructure, tools, and methods for developing evaluations will be critical to achieve more efficient and effective testing across the AI community,” the announcement stated. Anthropic aims to streamline the development of high-quality evaluations by funding tools and platforms that make it easier for subject-matter experts to create robust evaluations without needing coding skills.

“In addition to ASL assessments, we are interested in sourcing advanced capability and safety metrics,” Anthropic explained. “These metrics will provide a more comprehensive understanding of our models’ strengths and potential risks.”

Building a More Efficient Evaluation Ecosystem

Anthropic emphasized that developing effective evaluations is challenging and outlined key principles for creating strong evaluations. These include ensuring evaluations are sufficiently difficult, not included in training data, scalable, and well-documented.

“We’re interested in funding tools and infrastructure that streamline the development of high-quality evaluations,” Anthropic said in the statement. “These will be critical to achieve more efficient and effective testing across the AI community.”

However, the company acknowledges that “developing great evaluation is hard” and “even some of the most experienced developers fall into common traps, and even the best evaluations are not always indicative of risks they purport to measure.”

To help interested developers submit their proposals and refine their submissions, Anthropic said it will facilitate interactions with domain experts from the “Frontier Red Team, Finetuning, Trust & Safety,” and other relevant teams.

A request for comment from Anthropic remained unanswered.

With this initiative, Anthropic is sending a clear message: the race for advanced AI can’t be won without prioritizing safety. By fostering a more comprehensive and robust evaluation ecosystem, they’re laying the groundwork for a future where AI benefits humanity without posing existential threats.