Anthropic
Anthropic launched a programme to fund third-party AI evaluations, focusing on safety and advanced capabilities. The initiative covers areas like cybersecurity, multilingual skills, and societal impacts, aiming to improve AI safety across the industry.
Claude 3.5 Sonnet: Faster, smarter AI with enhanced vision. New Artifacts feature enables collaboration. Anthropic prioritizes safety and privacy.
Anthropic share their AI red teaming practices, covering various methods and their pros and cons. They propose steps for industry-wide standardisation, including funding for technical standards and supporting independent red teaming bodies.
Anthropic safeguards elections with AI testing: Policy Vulnerability Testing and automated evaluations to address risks in AI models.
Anthropic maps Claude Sonnet's inner workings, revealing features linked to concepts. The breakthrough could enhance AI safety and reliability.
Anthropic maps Claude's inner workings, identifying "features" in its neural network. Researchers can tune concept activation, impacting behaviour.