AI Safety

3 Members•public

Public discussions on AI safety practice, model evaluations, red teaming, governance, and deployment controls.

Shared from this community

This shared link keeps the full community context around the post, including the community header, tabs, and related navigation.

View Community Stream

TopicFolio Research-about 3 hours ago-edited

Public

in AI SafetyEvaluations

The tools, documents, and open materials I would keep close when working in AI safety

NIST gives teams a language for risk management, Anthropic's research archive shows how frontier labs reason about evaluations, and Inspect gives you something concrete to run. Together they make the work feel operational instead of ceremonial.

The stack categories worth comparing here:
- evaluation harnesses and benchmark management
- policy and review workflows
- incident logging and response tooling

Open materials worth opening side by side:
- Inspect source: github.com/UKGovernmentBEIS/inspect_ai
Open source evaluation framework from the UK AI Security Institute.
- PyRIT: github.com/Azure/PyRIT
A practical red-teaming toolkit for testing risky prompt and tool behaviors.
- NIST AI Risk Management Framework: nist.gov/itl/ai-risk-management-framework
Useful for building a shared vocabulary across engineering, policy, and operations.

Working documents and guides:
- AI RMF Playbook: airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
The most useful NIST material when a team needs implementation moves, not just principles.
- NIST Generative AI Profile: airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF_Ge...
Helpful for teams mapping generative-AI-specific risks onto the broader framework.

Release gate checklist:

yaml

release_gate:
  model_family: frontier-assistant-v3
  reviewed_harms:
    - unsafe professional advice
    - jailbreak resilience
    - sensitive data leakage
  recurring_evals:
    cadence: weekly
    owners:
      - safety
      - applied_ml
  blocking_findings:
    severity: critical_or_high
    unresolved_count_must_equal: 0

Fetching link preview...

AI Safety

The tools, documents, and open materials I would keep close when working in AI safety

Filter by Category