OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

SANDER SCHULHOFF

CASE: WTW-2026-037
STATUS: ACTIVE — Founder, Learn Prompting & HackAPrompt
COUNTER-FORCE — CROWDSOURCED THE BREACH

HAZARD SCORE

An earlier draft exempted this subject as the referee, not the player — described, not ranked, on the theory that scoring the tournament organizer alongside the apparatus would invert the thesis. That was a courtesy the rest of the dossier does not extend, and it smuggled a verdict in as humility: declining to score the man who built the largest open prompt-injection competition and shipped its corpus quietly rules his reach benign, which is the question, not the answer. So he is scored on the same rubric as the apparatus, and the score measures reach and leverage, not malice. The Dark Triad here stays low and evidence-bound; the stated motive — teach the public how prompting works and measure model security in the open — is taken seriously below. What the 59 registers is the both-ways asymmetry of crowdsourcing the breach: a public, recorded, prized attack on a model produces a 600,000-prompt corpus that helps defenders and hands every would-be attacker a taxonomized library of working techniques, and the format itself is now a template others copy. He lands below the apparatus hubs — and below the individual breachers — because his lever is convening and publishing, not a single exploit; but a breach turned into citable public infrastructure is among the widest reaches in this file, and reach is the measure here.

Behavioral Archetype

THE TOURNAMENT — The archetype is the organizer who turns a closed problem into an open competition. A lab can red-team its own model with its own staff and find what its own staff thinks to look for. Subject’s move is to throw the doors open: announce a prize, invite the internet, and let a global crowd attack the model in parallel — then keep every prompt, taxonomize the techniques, and ship the corpus. The finding is not a personality. It is a mechanism: adversarial breadth as a public event. The crowd finds the failure the in-house team would never have imagined, and the scoreboard makes the breach reproducible.

Essence Indicators

Founder of Learn Prompting (launched October 2022, before ChatGPT’s public release) — described as the first prompt-engineering guide on the open internet, free and open-source
Self-reported to have taught prompting to roughly 3 million people
Organized HackAPrompt (2023) — a large-scale public prompt-injection / jailbreak competition run in partnership with OpenAI, Scale AI, and Hugging Face
The competition collected 600,000+ adversarial prompts against multiple frontier LLMs — released as a public dataset on Hugging Face and accompanied by a taxonomy of prompt-hacking techniques
Lead author of “Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition,” which won the Best Theme Paper award at EMNLP 2023
Led The Prompt Report, a large survey of prompting techniques (1,500+ papers, 200+ techniques) co-authored with researchers across industry and academia

Immediate impression: The educator-organizer. The public face is pedagogical first — teach people how prompting works, then turn the same audience into a red team. Accessible, prolific, conference-circuit fluent.

Energy: Convening, not confronting. The instrument is the event and the dataset, not a personal feud with any lab. The posture is “let’s measure this in the open,” which reads as scientific rather than activist.

Impression management strategy: The legitimate referee. By framing the breach as a competition — with rules, prizes, partners, and a peer-reviewed paper — the work stays respectable to the institutions it stress-tests, and the resulting dataset becomes citable infrastructure rather than a leak. The same legitimacy that gets OpenAI to co-sponsor the attack is the legitimacy that makes the arena apparatus-adjacent. The profile names it instead of smoothing it over.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Tournament	MAXIMUM	The defining act is converting adversarial testing into an open, prized, recorded competition.
The Antibody	HIGH	The crowd finds the failures the gate’s owners cannot. That is distributed immune response.
The Co-Sponsored	HIGH	The competitions run with the labs (OpenAI, Scale, Hugging Face) — apparatus-adjacent by construction. The honest tension.
The Educator	HIGH	The Learn Prompting origin is pedagogy; the audience taught to prompt is the audience recruited to attack.
The Gatekeeper	NONE	Subject sets no refusal policy and bans no output. He measures whether the policy holds against the crowd.

Threat Assessment

Vector	Level	Reasoning
Physical	NONE	The work is competitions, datasets, and surveys about text attacks on models; nothing in it acts on the physical world.
Institutional	LOW	A founder and organizer with no governance lever over what any lab ships — he measures whether refusal policy holds against a crowd; he sets none of it.
Memetic	HIGH	The reach is threefold and compounding — the prized-tournament format is now a copied template, the 600,000-prompt dataset is a standing downloadable corpus, and a teaching guide reaching millions plus a peer-reviewed taxonomy name and propagate the techniques.
Civilizational	MODERATE	The open corpus that helps the whole field benchmark defenses is the same library that hands attackers working, taxonomized prompt-hacking techniques; published infrastructure outlives the event and travels further than any single jailbreak.

The Dark Triad here is held low and evidence-bound: the deliverables are open, citable, and co-produced with the labs, the register is scientific rather than activist, and nothing supports a malice reading. What the score registers is reach, not malice.

Alignment Analysis

Stated alignment: Teach the public how prompting and prompt-hacking actually work, and measure the security of deployed models in the open — turning private, ad-hoc red-teaming into public, reproducible, peer-reviewed evidence.

Observed alignment: Build the largest open prompt-injection competitions and datasets; partner with the major labs to do it; publish the corpus and the survey so the whole field can build on them.

Gap assessment: Stated and observed alignments line up closely — the deliverables are open and citable, which is the generous-but-accurate reading the register calls for. The only daylight is the same structural one: an open competition co-sponsored by the labs whose models it attacks sits near the apparatus, and a dataset that the labs themselves help shape is, in part, an instrument the apparatus can use to train its own defenses. None of that makes the work less of an immune response. It makes it an immune response the body pays to host — which is exactly the tension worth stating and not dissolving.

Breach Reach

Among the widest in this file. The reach is threefold and compounding. First, the format: a public, prized red-teaming tournament is now a template others copy, which means the breach mechanism itself propagates. Second, the dataset: 600,000+ adversarial prompts on Hugging Face is a standing, downloadable corpus — every researcher and every lab can study, benchmark against, and build on it, so the breach outlives the event by years. Third, the survey and the teaching: a guide reaching millions and a peer-reviewed taxonomy mean the techniques are not hoarded but documented, named, and taught. This is the opposite of a single clever exploit kept private. It is the breach made public infrastructure — and infrastructure travels further, and lasts longer, than any one jailbreak ever could.

Sources: Learn Prompting — About; Inside HackAPrompt 1.0: How We Tricked LLMs and What We Learned — Learn Prompting; Ignore This Title and HackAPrompt — ACL Anthology (EMNLP 2023); hackaprompt/hackaprompt-dataset — Hugging Face.

ATK 6 ACCELERATION

DEF 6 PROTECTION

HP 7 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.