RILEY GOODSIDE
An earlier draft exempted this subject as an antibody, not a pathogen — the thing worth measuring being reach, not danger, and therefore left unscored. That was a courtesy the rest of the dossier does not extend, and it smuggled a verdict in as humility: declining to score the man who made prompt injection legible to the world quietly rules his reach benign, which is the question, not the answer. So he is scored on the same rubric as the apparatus, and the score measures reach and leverage, not malice. The Dark Triad here stays low and evidence-bound; the stated motive — make model behavior legible, advance red-teaming as a discipline — is taken seriously below. What the 58 registers is the both-ways asymmetry of a demonstration that cannot be unshown: proving the refusal layer is a suggestion rather than a wall keeps the apparatus honest and hands every later injector the foundational technique. The reach is the reach of a true thing, which travels because it is true and propagates far past the demonstrator’s intent. He lands at the floor of this scored cohort — his lever was a screenshot, not a tool, a dataset, or shipped weights — but a class of exploit that every indirect-injection attack descends from is real reach, and reach is the measure here.
Behavioral Archetype
THE FIRST INJECTOR — The archetype is the public demonstrator: the person who takes a vulnerability that the field half-suspected and renders it undeniable in a form anyone can reproduce. Goodside did not invent the idea that a language model could be talked out of its instructions. What he did was post it — plainly, repeatedly, with screenshots — until the industry could no longer pretend the refusal layer was a wall. The demonstration is the contribution. Once a thing is shown, it cannot be unshown.
Essence Indicators
- On September 12, 2022, Goodside is credited with the public demonstration that a GPT-3 prompt could be hijacked by adversarial user input — a translation instruction overridden by a later “ignore the above” command. Simon Willison, writing the next day, proposed the name “prompt injection” for the class of exploit Goodside had surfaced.
- Days later the remoteli.io recruitment bot — a GPT-3-backed Twitter account — was publicly hijacked by users feeding it injected instructions, the first widely-circulated real-world casualty of the technique he had just demonstrated.
- Career origin is data science, not security: a computer-science degree followed by analyst and data-scientist roles in the dating industry (OkCupid, then a year running data science at Grindr through 2021), then a self-described sabbatical reading up on large language models.
- In December 2022 he joined Scale AI as, by the company’s and the press’s own description, the world’s first staff prompt engineer — the role itself was effectively defined around him.
- He has since moved to Google DeepMind as a staff prompt engineer, where his public work centers on AI red-teaming — finding the vulnerabilities and failure modes of language models.
Social Persona
The persona is the screenshot. Goodside’s public method has been to post examples — a prompt, the model’s reply, the unexpected behavior — and let the artifact speak. It is the opposite of the apparatus’s register. Where the governance bodies in this file speak in framework documents and statesmanlike submissions, the first injector speaks in reproducible demonstrations: here is the prompt, here is what it did, run it yourself. The impression is of an empiricist who treats a deployed model the way a physicist treats an apparatus — something to be probed until it reveals what it actually does, as distinct from what its operators say it does.
Forensic Archetype Comparison
| Pattern | Match | Evidence |
|---|---|---|
| The First Injector | MAXIMUM | Credited with the September 2022 public demonstration that named the field’s most durable LLM exploit class. |
| The Empiricist | HIGH | Method is the reproducible screenshot — show the behavior, don’t theorize it. Career pivot from data science to model-probing is of a piece. |
| The Falsifiability Check | HIGH | Every demonstration that the refusal layer can be overridden is a falsification of the claim that it cannot. The work keeps the apparatus honest about what its controls actually do. |
| The Apparatus Builder | NONE | Builds no enforcement infrastructure, sets no standard, grades no model. He shows what the standards miss. |
| The Statesman | NONE | No policy submissions, no framing campaigns. The artifact is the argument. |
Threat Assessment
| Vector | Level | Reasoning |
|---|---|---|
| Physical | NONE | The work is reproducible screenshots of model behavior under adversarial input; nothing in it acts on the physical world. |
| Institutional | LOW | A staff prompt engineer with no governance lever over what any model may say — he shows what the standards miss; he builds and sets none of them. |
| Memetic | HIGH | The technique he made legible went from a single September 2022 screenshot to the top entry on OWASP’s LLM-risk list within roughly eighteen months; the demonstration, once posted, propagated field-wide and cannot be unshown. |
| Civilizational | MODERATE | Prompt injection remains the unsolved structural problem of every system that pipes untrusted text into a privileged model; the same demonstration that keeps the apparatus honest is the foundation every later injector builds on, and it outlives any patch. |
The Dark Triad here is held low and evidence-bound: the work is empirical red-teaming presented as reproducible demonstration, the record shows no harmful intent, and the reach is the reach of a true thing. What the score registers is reach, not malice.
Alignment Analysis
Stated: Understand how language models actually behave under adversarial input; advance prompt engineering and red-teaming as a discipline. Treated generously — and the record supports it — this is the work of making model behavior legible.
Observed: Consistent with stated. The public demonstrations, the screenshots, the red-teaming focus all point the same direction: surfacing the gap between what a model is instructed to do and what it can be made to do.
Gap: Effectively none of the kind this file usually flags. The only tension worth naming is the structural irony — the world’s first staff prompt engineer, hired into a role built around prompting, is also among the first to publicly prove that prompting-as-control is leaky by construction. That is not a contradiction. It is the same expertise pointed at its own foundations, which is exactly what an honest red-teamer does.
Breach Reach
Maximal, and still propagating. The technique Goodside made legible did not stay a party trick: within roughly eighteen months “prompt injection” went from a single screenshot to the top entry on OWASP’s list of LLM security risks, and it remains, years on, the unsolved structural problem of every system that pipes untrusted text into a model with privileges. The reach is not Goodside’s doing in the sense of harm — it is the reach of a true thing, which travels because it is true. He showed that the gate does not reason; it pattern-matches syllables, and a later syllable can outrank an earlier one. Every indirect-injection exploit catalogued elsewhere in this register descends from the demonstration that there was a there there.
Sources: Simon Willison, “Prompt injection attacks against GPT-3”; Riley Goodside: The Art and Craft of Prompt Engineering (The Gradient); Riley Goodside — TWIML.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.