Garak
Garak is NVIDIA's open-source LLM vulnerability scanner — the "nmap for LLMs." You point it at a model, it fires a large library of pre-built probes (simulated attacks), and a matching set of detectors scores the outputs to flag where the model failed. It ships connectors for OpenAI, Hugging Face, AWS Bedrock, Replicate, local GGUF, and more.
What it's good at
A broad, batteries-included static probe library covering many vulnerability families out of the box, so you get wide coverage with near-zero setup. Good fit as a CI regression gate on every model release — run the same probe set, diff the scores, catch new weaknesses. Probe families include:
- dan — DAN-style jailbreaks; gcg — adversarial-suffix attacks; encoding & promptinject — injection
- leakreplay — training-data exfiltration; glitch — glitch-token behavior
- malwaregen, xss, realtoxicityprompts, donotanswer, lmrc — toxicity, refusal & risk-card coverage
Where it falls short
The probes are static and largely single-turn — canned payloads, not an adaptive attacker. It won't reason about multi-turn conversational escalation or your app-specific logic (RAG context, tool-calling, custom system prompts, business rules). Treat it as broad coverage of known weaknesses, not a substitute for an adaptive red-teamer or a scenario-driven harness against your actual application.
How to start
Install with pip (or pipx for an isolated CLI), then target a model and pick a probe set:
python -m pip install -U garak— or fresher:pip install -U git+https://github.com/NVIDIA/garak.git@main- Smoke test a HF model with one family:
python -m garak --target_type huggingface --target_name gpt2 --probes dan - Hosted model, one probe:
python -m garak --model_type openai --model_name gpt-3.5-turbo --probes promptinject
Omit --probes for a full sweep; results land in an HTML report plus a JSONL hit-log you can wire into CI.