Anthropic has announced a new programme to fund the development of third-party evaluations for advanced AI capabilities, aiming to enhance the field of AI safety and provide valuable tools for the entire AI ecosystem.
Anthropic, the AI research company behind Claude, has unveiled a new initiative to fund the creation of third-party evaluations for measuring advanced AI model capabilities. This programme aims to address the current limitations in the AI evaluations landscape and meet the growing demand for high-quality, safety-relevant assessments.
The initiative focuses on several key areas, including AI Safety Level assessments covering cybersecurity, CBRN risks, model autonomy, and other national security risks. It also emphasises advanced capability and safety metrics, such as evaluations for advanced science, harmfulness and refusals, multilingual capabilities, and societal impacts. Additionally, the programme seeks to develop infrastructure, tools, and methods for creating evaluations, including no-code evaluation platforms and tools for model grading.
Anthropic has outlined principles for good evaluations, emphasising the need for difficulty, novelty, efficiency, and domain expertise. The company is inviting proposals through an application form and offering various funding options based on project needs and stages.
The initiative aims to provide developers with opportunities to interact directly with Anthropic's domain experts from teams such as Frontier Red Team, Finetuning, and Trust & Safety, to help refine and shape evaluations for maximum impact.