With the 2024 global elections on the horizon, Anthropic has announced that it is proactively working to safeguard election integrity. It has developed a comprehensive testing process that combines in-depth expert testing (Policy Vulnerability Testing or PVT) with large-scale automated evaluations.
PVT involves collaboration between Anthropic and external subject matter experts to rigorously test AI models for potential issues. The process consists of planning, Testing and then reviewing results
To complement PVT, Anthropic also develops automated evaluations that allow for testing of model behaviour at a larger scale. The benefits of automated evaluations are boiled down to scalability, comprehensiveness and consistency.
Based on the findings from PVT and automated evaluations, Anthropic adapts its policies, enforcement controls, and the models themselves to address identifirisks.
Anthropic's testing methods not only surface potential issues but also serve as a way to measure the efficacy of mitigations and track progress over time.
Applying a "Swiss cheese model" for system safety, Anthropic deploy a set of layered and overlapping interventions, to prevent their models from unintentionally providing inaccurate or misleading information.