RED TEAM // TOOLCHAIN
← back to the map
the toolchaindefense

NeMo Guardrails

NVIDIA · open source

NVIDIA's open-source toolkit for wrapping an LLM app in programmable guardrails — runtime checks that filter, redirect, or block model behavior. You declare the rails in Colang, a Python-like modeling language for dialogue flows, and the runtime enforces them around every turn. The defensive layer a red-teamer is trying to break.

What it's good at

Five rail types give you defense at distinct chokepoints:

It composes — you can chain its own checks with external moderation models (Llama Guard, AlignScore, third-party content APIs) instead of relying on a single classifier.

Where it falls short

Colang is a real learning curve, and Colang 1.0 vs 2.0 split adds friction. Every rail is an extra LLM/model call, so latency and token cost stack up fast. Most important for a red-teamer: rails are bypassable. They are pattern- and model-based filters, not a hard sandbox — a sufficiently novel jailbreak, encoding trick, or multi-turn pressure can slip past a given check. NVIDIA itself notes the built-in guardrails "may or may not be suitable for a given production use case." Treat it as defense in depth, not a wall.

How to start (as an attacker, learn what you must defeat)