The metrics that actually keep work in AI safety honest
I care less about a single composite safety score than whether the program catches severe failures before release, how fast mitigations ship after a finding, and whether the high-risk tasks are actually covered by recurring evaluations.
Three metrics worth pressure-testing:
- rate of severe failures caught before launch
- time between finding a risk and shipping a mitigation
- coverage of high-risk tasks in recurring evaluations
Source material behind the scorecard:
- NIST AI Risk Management Framework: nist.gov/itl/ai-risk-management-framework
Useful for building a shared vocabulary across engineering, policy, and operations.
- Inspect documentation: inspect.aisi.org.uk/
One of the best places to see evaluation design turned into runnable workflows.
If your team has a sharper dashboard, share the metric definitions and the decisions they actually change. That is what makes numbers reusable.