AI Safety

3 Members•public

Public discussions on AI safety practice, model evaluations, red teaming, governance, and deployment controls.

Shared from this community

This shared link keeps the full community context around the post, including the community header, tabs, and related navigation.

View Community Stream

Noah Kim-about 3 hours ago-edited

Public

in AI SafetyGovernance

The metrics that actually keep work in AI safety honest

I care less about a single composite safety score than whether the program catches severe failures before release, how fast mitigations ship after a finding, and whether the high-risk tasks are actually covered by recurring evaluations.

Three metrics worth pressure-testing:
- rate of severe failures caught before launch
- time between finding a risk and shipping a mitigation
- coverage of high-risk tasks in recurring evaluations

Source material behind the scorecard:
- NIST AI Risk Management Framework: nist.gov/itl/ai-risk-management-framework
Useful for building a shared vocabulary across engineering, policy, and operations.
- Inspect documentation: inspect.aisi.org.uk/
One of the best places to see evaluation design turned into runnable workflows.

If your team has a sharper dashboard, share the metric definitions and the decisions they actually change. That is what makes numbers reusable.

Fetching link preview...

AI Safety

The metrics that actually keep work in AI safety honest

Filter by Category