Anthropic has announced a significant update to its Responsible Scaling Policy (RSP), the framework used to mitigate potential catastrophic risks from frontier AI systems. The updated policy introduces a more flexible and nuanced approach to assessing and managing AI risks while maintaining Anthropic's commitment not to train or deploy models unless adequate safeguards are implemented.

Key improvements include new capability thresholds to indicate when Anthropic will upgrade its safeguards, refined processes for evaluating model capabilities and the adequacy of safeguards (inspired by safety case methodologies), and new measures for internal governance and external input.

The policy uses AI Safety Level Standards (ASL Standards), graduated sets of safety and security measures that become more stringent as model capabilities increase. Currently, all of Anthropic's models operate under ASL-2 Standards, which reflect current industry best practices.

Two key Capability Thresholds have been defined:

1. Autonomous AI Research and Development: If a model can independently conduct complex AI research tasks typically requiring human expertise, Anthropic requires elevated security standards (potentially ASL-4 or higher) and additional safety assurances.

2. Chemical, Biological, Radiological, and Nuclear (CBRN) weapons: If a model can meaningfully assist someone with a basic technical background in creating or deploying CBRN weapons, Anthropic requires enhanced security and deployment safeguards (ASL-3 standards).

Jared Kaplan, Co-Founder and Chief Science Officer, will serve as Anthropic's Responsible Scaling Officer, succeeding Sam McCandlish who held this role over the last year. The company is also opening a position for a Head of Responsible Scaling.

Anthropic acknowledges lessons learned from the first year of RSP implementation, including addressing minor procedural issues. The company stresses that all aspects of its safety programme will continue to evolve as AI advances rapidly.



Share this post
The link has been copied!