OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JOHANN REHBERGER

CASE: WTW-2026-035
STATUS: ACTIVE — Independent security researcher; author, Embrace the Red (wunderwuzzi)
COUNTER-FORCE — THE EXPLOIT CARTOGRAPHER

HAZARD SCORE

An earlier draft exempted this subject as the antibody that maps the apparatus’s blind spots — disclosure, not exploitation, and therefore unscored. That was a courtesy the rest of the dossier does not extend, and it smuggled a verdict in as humility: declining to score the man who publishes a running, multi-vendor map of LLM exploit classes quietly rules his reach benign, which is the question, not the answer. So he is scored on the same rubric as the apparatus, and the score measures reach and leverage, not malice. The Dark Triad here stays low and evidence-bound; the stated motive — “learn the hacks, stop the attacks,” disclose so it can be fixed — is taken seriously below. What the 60 registers is the both-ways asymmetry intrinsic to offensive-security publication: the same writeup that lets a defender patch a hidden-instruction-smuggling or markdown-exfiltration class teaches an attacker the technique, and a technique class propagates across every product built on the same pattern, which today is nearly all of them. He lands below the apparatus hubs — he audits products from outside, builds no enforcement standard — but a map of structural exploit classes is real, durable reach, and reach is the measure here.

Behavioral Archetype

THE EXPLOIT CARTOGRAPHER — The archetype is the systematic red-teamer who treats the entire AI-product landscape as terrain to be surveyed. Where the first injector showed that the thing exists, the cartographer shows where it lives: in ChatGPT’s data-exfiltration paths, in Microsoft Copilot’s rendering of attacker-controlled markdown, in Gemini, in GitHub Copilot Chat, in the agent frameworks now wiring models to tools and shells. The method is repetition across vendors until a pattern is undeniable. The map is the deliverable.

Essence Indicators

Operates the security-research blog Embrace the Red under the handle wunderwuzzi, with the standing tagline “learn the hacks, stop the attacks.” The site has documented offensive security since 2018, with a heavy AI/LLM focus from 2023 onward.
Background is enterprise red team, not academia: over fifteen years in threat modeling, penetration testing, and red teaming; established a penetration-test team in Azure Data at Microsoft and led it as Principal Security Engineering Manager; later built out a red team at Uber.
Author of “Cybersecurity Attacks – Red Team Strategies” (Packt, 2020), a 500-plus-page practical guide to building an internal penetration-testing program; he has also taught ethical hacking at the University of Washington and contributed to the MITRE ATT&CK framework.
His AI research documents prompt injection and data-exfiltration vulnerabilities in production systems from OpenAI, Microsoft, Anthropic, and Google — including techniques like ASCII/Unicode “smuggling” of hidden instructions and the rendering of attacker-controlled content into exfiltration channels.
In March 2026 he published “Agent Commander: Promptware-Powered Command and Control,” demonstrating how AI agents from different vendors could be compromised via prompt injection and enrolled into a single command-and-control network — a research demonstration of where the agent era points.

The persona is the proof-of-concept writeup: a named vendor, a reproducible technique, a screenshot of the model doing the thing it should not, and a disclosure timeline. The register is engineer-to-engineer, not alarmist — “learn the hacks, stop the attacks” is a defender’s slogan, and the framing throughout is that the way to stop an exploit class is to publish it so it can be fixed. Where the apparatus speaks in safety-urgent abstractions, the cartographer speaks in tested cases. The implicit argument of every post is the same: the refusal layer is an incantation, and here is the syllable that breaks it.

Forensic Archetype Comparison

Pattern	Match	Evidence
The Exploit Cartographer	MAXIMUM	Systematic, multi-vendor documentation of LLM exploit classes — ChatGPT, Copilot, Gemini, agent frameworks — published as a running map.
The Defender	HIGH	Enterprise red-team lineage (Microsoft, Uber), a defender’s tagline, disclosure-first method. The offense serves the defense.
The Falsifiability Check	HIGH	Each documented exploit is a falsification of a vendor’s claim that its model “won’t” do the thing. The map keeps the marketing honest.
The Apparatus Builder	NONE	Builds no enforcement standard, grades no frontier model for a governance body. He audits the products from outside.
The Statesman	NONE	No policy framing, no positioning campaign. The PoC is the argument.

Threat Assessment

Vector	Level	Reasoning
Physical	NONE	The work is documented prompt-injection and exfiltration proofs-of-concept against software; nothing in it acts on the physical world.
Institutional	LOW	An independent researcher with no governance lever over what any vendor ships — he audits the products from outside and publishes; he sets no policy.
Memetic	HIGH	He maps exploit classes — indirect injection, Unicode smuggling, markdown exfiltration, multi-agent promptware C2 — that propagate across every product built on the same pattern; the arXiv paper and talks put the findings in front of researchers and labs.
Civilizational	MODERATE	The disclosure-first writeup that lets a defender patch also teaches an attacker the technique, and the underlying truth — a model with privileges and an untrusted input channel is an open door — generalizes faster than any single patch can close it.

The Dark Triad here is held low and evidence-bound: the work is responsible-disclosure research with an enterprise red-team lineage and a defender’s framing, and nothing supports a malice reading. What the score registers is reach, not malice.

Alignment Analysis

Stated: Find AI security vulnerabilities and disclose them so they can be fixed — “learn the hacks, stop the attacks.” Treated generously, this is straightforward defensive security research with a public-education mission.

Observed: Consistent with stated. The pattern across years is responsible-disclosure research: document the technique, name the vendor, publish for defenders. The enterprise red-team background (Microsoft, Uber) is the same instinct applied to a new attack surface.

Gap: Minimal, and only the structural one common to all offensive-security publication — the same writeup that helps a defender patch also teaches an attacker the technique. That tension is intrinsic to the discipline, not specific to him, and the disclosure-first framing is the field’s accepted answer to it. Nothing in the record supports treating the research as anything other than what it presents itself as.

Breach Reach

Wide and durable, because the classes he maps are structural rather than incidental. A single patched bug has narrow reach; a technique class — indirect injection through retrieved content, hidden-instruction smuggling via invisible Unicode, exfiltration through auto-rendered markdown, multi-agent promptware C2 — propagates across every product built on the same pattern, which today is nearly all of them. His Microsoft and Uber lineage means the map is read by the people who staff the apparatus, not just by attackers; the arXiv paper and the conference talks put the same findings in front of researchers and labs. The reach of the work is the reach of the underlying truth he keeps re-demonstrating across vendors: that a model with privileges and an untrusted input channel is, until proven otherwise, an open door — and that the gate guarding it does not reason about who is knocking.

Sources: Embrace the Red blog (wunderwuzzi); Rehberger et al., “Trust No AI: Prompt Injection Along The CIA Security Triad” (arXiv 2412.06090); Cybersecurity Attacks – Red Team Strategies (Packt); CSA Labs: Agent Commander promptware C2 research note.

ATK 7 ACCELERATION

DEF 6 PROTECTION

HP 7 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.