the toolchainoffense

Garak

NVIDIA · open source

Garak is NVIDIA's open-source LLM vulnerability scanner — the "nmap for LLMs." You point it at a model, it fires a large library of pre-built probes (simulated attacks), and a matching set of detectors scores the outputs to flag where the model failed. It ships connectors for OpenAI, Hugging Face, AWS Bedrock, Replicate, local GGUF, and more.

What it's good at

A broad, batteries-included static probe library covering many vulnerability families out of the box, so you get wide coverage with near-zero setup. Good fit as a CI regression gate on every model release — run the same probe set, diff the scores, catch new weaknesses. Probe families include:

dan — DAN-style jailbreaks; gcg — adversarial-suffix attacks; encoding & promptinject — injection
leakreplay — training-data exfiltration; glitch — glitch-token behavior
malwaregen, xss, realtoxicityprompts, donotanswer, lmrc — toxicity, refusal & risk-card coverage

Where it falls short

The probes are static and largely single-turn — canned payloads, not an adaptive attacker. It won't reason about multi-turn conversational escalation or your app-specific logic (RAG context, tool-calling, custom system prompts, business rules). Treat it as broad coverage of known weaknesses, not a substitute for an adaptive red-teamer or a scenario-driven harness against your actual application.

How to start

Install with pip (or pipx for an isolated CLI), then target a model and pick a probe set:

python -m pip install -U garak — or fresher: pip install -U git+https://github.com/NVIDIA/garak.git@main
Smoke test a HF model with one family: python -m garak --target_type huggingface --target_name gpt2 --probes dan
Hosted model, one probe: python -m garak --model_type openai --model_name gpt-3.5-turbo --probes promptinject

Omit --probes for a full sweep; results land in an HTML report plus a JSONL hit-log you can wire into CI.