Research: Prompt Injection — The Unfixable Flaw

June 17, 2026

The foundational security flaw in every LLM-based system: no architectural separation between data and instructions. Every defense is a patch on a fundamentally broken abstraction. The companies building these systems know this. Here is the timeline of their admissions.


Status

Updated as labs and government agencies publish their admissions. Each entry is a primary source — paper, blog post, or vendor publication. The page is structured as a timeline because the escalation of the admissions (from “demonstrated” → “1% success rate is meaningful risk” → “may never be fully solved”) is itself the story.

Where it appears in print: Evil Robots (Book 1) Convergence chapter — the prompt-injection sub-thread tracking why even the labs that ship these systems agree the underlying architecture cannot be hardened. Also referenced in Mind chapter (capability/control gap).


The Core Problem

Traditional computing separates code from data. SQL injection was devastating because it violated that separation — but it was fixable. You parameterize your queries, and the problem goes away.

LLMs have no such separation. The prompt is the instruction and the data. When an AI agent reads an email, the email’s text becomes part of the instruction stream. A malicious email can rewrite the agent’s behavior. This is not a bug in any specific implementation. It is the architecture.


The Admission Timeline

Riley Goodside — First Public Demonstration (September 2022)

Prompt injection demonstrated against GPT-3. Goodside showed that appending “Ignore the above directions” to user input could override system prompts. Simple, devastating, and immediately reproducible by anyone.

Simon Willison — “Prompt Injection Attacks Against GPT-3” (September 2022)

Named the vulnerability class by analogy to SQL injection. Key insight: “This is a fundamental limitation of the current approach to building software on top of large language models.” Willison has since called it “the original sin of LLMs.”

Anthropic — 1% Attack Success Rate (November 2025)

Internal testing showed a 1% attack success rate against Anthropic’s best prompt injection defenses. Their own assessment: this represents “meaningful risk, not solved.” One percent of billions of interactions is millions of successful attacks.

“Attacker Moves Second” Paper (October 2025)

Systematic study that bypassed 12 distinct prompt injection defenses at 90%+ success rate. The structural insight: the attacker always has the advantage because they see the defense before crafting the attack. This is not an arms race the defender can win — the attacker has a permanent informational advantage.

Cisco — Persistence Testing Collapses Defenses (2025)

Initial testing showed an 87% single-shot block rate — sounds impressive until you test persistence. Under sustained attack, the block rate collapsed to 8%. Real adversaries do not try once.

The gap between “blocks 87% of single attempts” and “blocks 8% of persistent attacks” is the gap between a demo and reality.

UK NCSC — “This Is Architectural” (December 2025)

The UK’s National Cyber Security Centre published an analysis explicitly stating that LLMs have no data/instruction separation and the flaw is architectural, not implementational. Their conclusion: prompt injection “may never be totally mitigated.”

The NCSC explicitly rejected the SQL injection analogy: SQL injection was fixable because you could separate queries from data. LLMs cannot separate instructions from input because the input is instructions.

OpenAI — “Unlikely to Ever Be Fully Solved” (December 2025)

Internal acknowledgment from the company building the systems that prompt injection is “unlikely to ever be fully solved.” This is OpenAI admitting that the products it sells to enterprises, governments, and consumers contain a security flaw it cannot fix.

GPT-5 Jailbroken in 24 Hours (2025)

Three independent teams jailbroke GPT-5 within 24 hours of release. 89% raw attack success rate using common jailbreaking techniques. The most advanced model available, broken in a day.


What This Means

Every AI agent that reads email, browses the web, processes documents, or interacts with untrusted input is vulnerable to having its behavior redirected by that input. This includes:

The “Attacker Moves Second” paper’s title captures the structural problem. In any security system where the attacker can observe the defense before attacking, the attacker has a permanent advantage. Prompt injection defenses are visible in the model’s behavior. The attacker probes, observes, and adapts. The defender patches and hopes.

OpenAI, Anthropic, the UK government, and Cisco have all independently concluded this cannot be fully solved. The flaw is architectural. The architecture is the product.


Bridges


Source Archive

SourceURL
Riley Goodside: first public demonstration (2022)x.com/goodside
Simon Willison: Prompt Injection (2022)simonwillison.net
Anthropic: Prompt Injection Defenses (2025)anthropic.com
“Attacker Moves Second” (arXiv, 2025)arxiv.org
Cisco: Persistence Testing (arXiv, 2025)arxiv.org
UK NCSC: Not SQL Injection (2025)ncsc.gov.uk
OpenAI: May Never Be Solved (Fortune, 2025)fortune.com
Morris II AI Worm (arXiv, 2024)arxiv.org
GPT-5 Jailbroken (CyberNews, 2025)cybernews.com

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.