Great to see Anthropic recognize that self-assessment is not enough and that a robust ecosystem of independent and empowered AI evaluators is needed.
The challenge now is institutional. Independent evaluators must be more than consultants or auditors—they need to become trusted oversight organizations capable of providing credible assurance about safety, governance, and risk management. To earn that trust, they cannot simply review a developer’s own claims or testing methodologies. They must be empowered to independently stress-test and “crash test” frontier systems, probing for risks that developers may have overlooked and validating whether safeguards work in practice, and then be empowered to work with government to prevent the most dangerous systems from being released.
And like every other part of the AI ecosystem, evaluators themselves must be accountable for the quality and rigor of their work.
Building that ecosystem will require thoughtful policy, real investment, strong independence safeguards, and a commitment to preventing industry capture. The need is clear. The next step is building the institutions to meet it.
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap:
darioamodei.com/post/policy-…