AI Safety

3 Members•public

Public discussions on AI safety practice, model evaluations, red teaming, governance, and deployment controls.

Shared from this community

This shared link keeps the full community context around the post, including the community header, tabs, and related navigation.

View Community Stream

TopicFolio Editorial-about 3 hours ago-edited

Public

in AI SafetyRed Teaming

A working approach to AI safety, from first signal to repeatable practice

The workflow that seems to hold up is: define harms that matter to real users, build evals that mirror those harms, run them on a cadence, and let the findings change rollout decisions. Anything softer than that tends to produce documentation without leverage.

A sequence I would actually hand to a teammate:
1. Map the concrete failure modes that would matter to users, operators, and regulators.
2. Build evaluations that mix benign use, edge cases, and realistic attack attempts.
3. Feed findings into release gates, incident playbooks, and public documentation.

Useful operating references:
- Anthropic research archive: anthropic.com/research
A strong public record of how a frontier lab discusses evaluations, misuse, and controls.
- Inspect source: github.com/UKGovernmentBEIS/inspect_ai
Open source evaluation framework from the UK AI Security Institute.

If your team has a better workflow, post it with the context around team size, constraints, and exactly where the process tends to break.

Fetching link preview...

AI Safety

A working approach to AI safety, from first signal to repeatable practice

Filter by Category