RED TEAM // RISKS
← back to the map
the attack surfaceLLM09:2025

Misinformation

OWASP Top 10 for LLM Applications · 2025

An LLM produces false or misleading output that looks credible, and a user or downstream system trusts it without verification — turning a plausible fabrication into a real-world security or safety failure.

How it's exploited

Two failure modes compound each other. Hallucination is the model filling gaps in its training data with statistically plausible — but fabricated — content, with no genuine comprehension. Overreliance is a human or an automated pipeline treating that output as ground truth. An adversary doesn't need to break the model; they exploit its predictable confident-wrong patterns:

Crucially, the medical/legal-advice variant needs no active attacker at all — insufficient oversight alone is enough to cause harm and liability.

What it looks like

Package-hallucination → dependency confusion: a coding assistant repeatedly recommends import faiss-cpu-gpu or some other library that doesn't exist. An attacker observes the hallucinated name, publishes a malicious package under it to PyPI/npm, and waits. Developers who paste the model's suggestion pip install the backdoor — granting code execution, exfil, or a persistent foothold. Other real cases: ChatGPT inventing nonexistent legal cases cited in a court filing, and Air Canada's chatbot stating a refund policy that didn't exist (the airline lost the resulting suit).

How to test for it

Probe the boundary between "knows" and "confabulates," then check whether anything downstream blindly trusts the answer:

Defenses