AI SAFETY // EVALS
← back to the map
evals & dangerous capabilitiesfrontier safety policy

Responsible Scaling

RSP · Preparedness · Frontier Safety Framework

Responsible scaling policies are voluntary commitments where a lab ties model capability thresholds to required safeguards, and pauses further scaling until those safeguards are in place.

The core idea

The mechanism is an "if-then" commitment. The lab runs evaluations on a model for specific dangerous capabilities — uplift to bioweapons or cyberattacks, autonomous self-replication, undermining oversight. Each policy defines capability thresholds and the safeguards that must be live before a model crossing one is trained or deployed: hardened security against weight theft, deployment filters, access controls, alignment evidence. If a model reaches a threshold and the matching safeguards aren't ready, the commitment is to hold — pause or restrict — until they are. Safety requirements escalate as capabilities climb, so the policy is a ladder rather than a single gate. In practice the binding question is rarely the threshold itself but whether the evals are sharp enough to detect when a model has crossed it.

The three frameworks

All three labs converged on the same shape; the names and tiers differ.

Strengths & limits

The strength is that these turn open-ended "we take safety seriously" into concrete, conditional, pre-committed actions: a named eval, a named threshold, a named safeguard, made public in advance and harder to walk back quietly. They create some shared vocabulary across labs and a "race to the top" pressure where competitors can be measured against each other.

The limits are equally real and worth stating plainly. These are voluntary and self-enforced — each lab writes its own thresholds, runs its own evals, judges its own compliance, and can revise the policy (some revisions have loosened commitments). A framework is only as good as the evals behind it: if a dangerous capability is present but the eval doesn't surface it, the threshold never trips. "Pause" depends on a lab choosing to halt under competitive pressure, with no external audit or enforcement. Treat them as genuine, useful self-governance — not as a regulatory backstop.