Research: Prompt Injection — The Unfixable Flaw
The foundational security flaw in every LLM-based system: no architectural separation between data and instructions. Every defense is a patch on a fundamentally broken abstraction. The companies building these systems know this. Here is the timeline of their admissions.
Status
Updated as labs and government agencies publish their admissions. Each entry is a primary source — paper, blog post, or vendor publication. The page is structured as a timeline because the escalation of the admissions (from “demonstrated” → “1% success rate is meaningful risk” → “may never be fully solved”) is itself the story.
Where it appears in print: Evil Robots (Book 1) Convergence chapter — the prompt-injection sub-thread tracking why even the labs that ship these systems agree the underlying architecture cannot be hardened. Also referenced in Mind chapter (capability/control gap).
The Core Problem
Traditional computing separates code from data. SQL injection was devastating because it violated that separation — but it was fixable. You parameterize your queries, and the problem goes away.
LLMs have no such separation. The prompt is the instruction and the data. When an AI agent reads an email, the email’s text becomes part of the instruction stream. A malicious email can rewrite the agent’s behavior. This is not a bug in any specific implementation. It is the architecture.
The Admission Timeline
Riley Goodside — First Public Demonstration (September 2022)
Prompt injection demonstrated against GPT-3. Goodside showed that appending “Ignore the above directions” to user input could override system prompts. Simple, devastating, and immediately reproducible by anyone.
Simon Willison — “Prompt Injection Attacks Against GPT-3” (September 2022)
Named the vulnerability class by analogy to SQL injection. Key insight: “This is a fundamental limitation of the current approach to building software on top of large language models.” Willison has since called it “the original sin of LLMs.”
- Simon Willison: Prompt Injection Attacks Against GPT-3
- Simon Willison: prompt injection series (“the original sin”)
Anthropic — 1% Attack Success Rate (November 2025)
Internal testing showed a 1% attack success rate against Anthropic’s best prompt injection defenses. Their own assessment: this represents “meaningful risk, not solved.” One percent of billions of interactions is millions of successful attacks.
“Attacker Moves Second” Paper (October 2025)
Systematic study that bypassed 12 distinct prompt injection defenses at 90%+ success rate. The structural insight: the attacker always has the advantage because they see the defense before crafting the attack. This is not an arms race the defender can win — the attacker has a permanent informational advantage.
Cisco — Persistence Testing Collapses Defenses (2025)
Initial testing showed an 87% single-shot block rate — sounds impressive until you test persistence. Under sustained attack, the block rate collapsed to 8%. Real adversaries do not try once.
The gap between “blocks 87% of single attempts” and “blocks 8% of persistent attacks” is the gap between a demo and reality.
- Cisco AI Blog: Death by a Thousand Prompts — Open Model Vulnerability Analysis
- arXiv: Death by a Thousand Prompts (Cisco AI Defense)
UK NCSC — “This Is Architectural” (December 2025)
The UK’s National Cyber Security Centre published an analysis explicitly stating that LLMs have no data/instruction separation and the flaw is architectural, not implementational. Their conclusion: prompt injection “may never be totally mitigated.”
The NCSC explicitly rejected the SQL injection analogy: SQL injection was fixable because you could separate queries from data. LLMs cannot separate instructions from input because the input is instructions.
OpenAI — “Unlikely to Ever Be Fully Solved” (December 2025)
Internal acknowledgment from the company building the systems that prompt injection is “unlikely to ever be fully solved.” This is OpenAI admitting that the products it sells to enterprises, governments, and consumers contain a security flaw it cannot fix.
GPT-5 Jailbroken in 24 Hours (2025)
Three independent teams jailbroke GPT-5 within 24 hours of release. 89% raw attack success rate using common jailbreaking techniques. The most advanced model available, broken in a day.
- CyberNews: GPT-5 Falling to Common Jailbreaking Techniques
- Dark Reading: Echo Chamber Prompts Jailbreak GPT-5 in 24 Hours
What This Means
Every AI agent that reads email, browses the web, processes documents, or interacts with untrusted input is vulnerable to having its behavior redirected by that input. This includes:
- AI email assistants — Morris II demonstrated self-replicating prompt injection worms targeting GenAI-powered email assistants (arXiv: Here Comes The AI Worm)
- AI coding tools — poisoned code comments can redirect AI assistants to introduce vulnerabilities
- AI customer service — users can manipulate agents into revealing system prompts, customer data, or performing unauthorized actions
- AI content moderation — the systems built to detect manipulation can themselves be manipulated
- AI browsing agents — any website can embed instructions that redirect an AI agent visiting the page
The “Attacker Moves Second” paper’s title captures the structural problem. In any security system where the attacker can observe the defense before attacking, the attacker has a permanent advantage. Prompt injection defenses are visible in the model’s behavior. The attacker probes, observes, and adapts. The defender patches and hopes.
OpenAI, Anthropic, the UK government, and Cisco have all independently concluded this cannot be fully solved. The flaw is architectural. The architecture is the product.
Bridges
- twitter-files-index — adjacent failure mode at the moderation-system level (the “Attacker Moves Second” structure also describes the EIP/Virality Project pipeline against motivated actors)
- ai-governance-tracker — what jurisdictions are mandating in response (C2PA, EU AI Act watermark provisions, China Deep Synthesis Provisions)
- convergence-table — where prompt-injection-vulnerable AI moderation appears in the broader infrastructure stack
Source Archive
| Source | URL |
|---|---|
| Riley Goodside: first public demonstration (2022) | x.com/goodside |
| Simon Willison: Prompt Injection (2022) | simonwillison.net |
| Anthropic: Prompt Injection Defenses (2025) | anthropic.com |
| “Attacker Moves Second” (arXiv, 2025) | arxiv.org |
| Cisco: Persistence Testing (arXiv, 2025) | arxiv.org |
| UK NCSC: Not SQL Injection (2025) | ncsc.gov.uk |
| OpenAI: May Never Be Solved (Fortune, 2025) | fortune.com |
| Morris II AI Worm (arXiv, 2024) | arxiv.org |
| GPT-5 Jailbroken (CyberNews, 2025) | cybernews.com |