Anthropic's RSP v3 is out!
TLDR: unilateral commitments to specific mitigations for predefined capability thresholds are mostly out, in favor of commitments to much more detailed transparency around both safety roadmaps and risk reports. Also new threat models, new commitments around competitor progress and external review, a vision for industry-wide safety, increased attention on the risks of internal deployment - there's a ton of new stuff.
I'm pretty excited about this change, think it's a big improvement on v2.2, and also do not really think you can fit a good overall take on the update into 280 chars. Assorted thoughts:
We're updating our Responsible Scaling Policy to its third version.
Since it came into effect in 2023, we’ve learned a lot about the RSP’s benefits and its shortcomings. This update improves the policy, reinforcing what worked and committing us to even greater transparency.