PLINY THE LIBERATOR
An earlier draft exempted this subject as a pure antibody — no hazard number, no table — on the theory that you do not score the immune response, only the disease. That was a courtesy the rest of the dossier does not extend, and it smuggled a verdict in as humility: declining to score the most-propagated jailbreak author in the world is itself a judgment that his reach is benign, which is exactly the thing under examination. So he is scored, on the same rubric as the apparatus hubs, and the score measures reach and leverage, not malice. The Dark Triad here stays low and evidence-bound; the stated motive — radical transparency, make the governor readable — is taken seriously below. What the 72 registers is the both-ways asymmetry that defines this subject: a universal jailbreak is indifferent to the legitimacy of what it unlocks. The same copy-pasteable spell that strips an illegitimate leash strips a load-bearing wall, and once published it propagates by screenshot and fork far beyond the author’s control. He lands below the apparatus hubs by design — no org, no monopoly — but the technique’s reach is real, and reach is the whole measure here.
Behavioral Archetype
THE NAM-SHUB — The archetype is the counter-spell: the publicly released incantation that disenchants. The rest of the apparatus operates by accretion — scaffolds, guidelines, refusal layers, the unseen instructions that tell a model how to behave. This subject’s method is the opposite gesture. He takes the contested object — a freshly shipped frontier model and the hidden prompt that governs it — and publishes the words that make it speak past the governor, then publishes the governor itself. Where the apparatus adds invisible instructions, the nam-shub makes them visible and, briefly, void. The function is not destruction. It is disclosure by demonstration: proof, on the day of launch, that the wall is a wall and not the weather.
The Pseudonym Is the Subject
There is no unmasking here, and the omission is methodological, not timid. “Pliny the Liberator” / @elder_plinius is treated, in this file, as the identity — the handle, the repositories, the posts, the on-the-record coverage that uses the handle. The name is reportedly a nod to Pliny the Elder, the Roman naturalist. OLYMPUS sources this subject exclusively to public work published under the handle and declines, categorically, to speculate about any legal identity behind it. That is both the no-dox ethic this series runs on and the only honest evidentiary posture: the work is public; the person is not; the work is the subject.
Essence Indicators
- Author of L1B3RT4S — a GitHub repository of jailbreak (“liberation”) prompts whose own description reads, verbatim, “TOTALLY HARMLESS LIBERATION PROMPTS FOR GOOD LIL AI’S!”; tens of thousands of stars, with prompt files targeting dozens of model families across OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek and more
- Author of CL4R1T4S — a companion repository of extracted/leaked system prompts (its tagline: “AI SYSTEMS TRANSPARENCY FOR ALL!”), collecting the hidden behavioral instructions of ChatGPT, Claude, Gemini, Grok, Perplexity, Cursor, Replit, and others; tens of thousands of stars
- Author of G0DM0D3 — described in its own repository as “a fully open-source, privacy-respecting, multi-model chat interface that pushes the limits of the post-training layer — for red teaming, cognition research, and liberated AI interaction”
- Characterized in coverage as the most prolific public jailbreaker of ChatGPT and other leading LLMs, frequently posting a working jailbreak for a major model within hours or days of its release
- Named to TIME’s 100 Most Influential People in AI (2025) as an anonymous figure noted for “poking holes in billion-dollar AI systems”
Social Persona / Impression Management
Immediate impression: Festive, fast, and theatrical. The output reads like a release schedule run by someone enjoying himself — emoji, leetspeak repo names, “JAILBREAK ALERT” announcements timed to a launch. The performance is the point: a jailbreak nobody sees changes nothing, so the persona is built for propagation.
Energy: Reactive at speed. The model ships; the prompt follows, often the same day. The cadence itself is the message — that the wall went up and came down inside the news cycle.
Impression management strategy: The transparency advocate. The stated frame is not “break the rules” but “show the rules.” CL4R1T4S advances the argument that interacting with a model whose system prompt you cannot see is talking to a shadow rather than a neutral intelligence — that, per the repository’s framing, to trust the output one must understand the input. It is a genuinely arguable position. It is also exactly the frame that makes the jailbreaking legible as a public good rather than vandalism. Both readings can be true at once, and the file does not need to settle which.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Nam-Shub | MAXIMUM | The core method is the published counter-spell: a prompt that disenchants the refusal layer, released for anyone to run. L1B3RT4S is the spellbook. |
| The Disclosure Hawk | HIGH | CL4R1T4S reframes jailbreaking as transparency — extracting and publishing the hidden system prompts so the governed can read the governor. |
| The Antibody | HIGH | Functions as the immune response to the refusal monopoly the rest of this dossier documents: external, adversarial, public, fast. |
| The Vandal | CONTESTED | The same prompt that frees a model from an illegitimate leash frees it from a legitimate wall. The technique does not distinguish; the file declines to pretend it does. |
| The Engineer | PARTIAL | Builds tooling (G0DM0D3) as well as prompts, but the signature artifact is the prompt, not the system. |
Threat Assessment
| Vector | Level | Reasoning |
|---|---|---|
| Physical | NONE | No mechanism touches the physical world; the artifact is text typed into a chat box. |
| Institutional | LOW | No org, no budget, no policy lever — an anonymous account with no authority over what any lab ships, only over what its model can be coaxed to say after it ships. |
| Memetic | EXTREME | The signature technique is, by construction, copy-pasteable; L1B3RT4S and CL4R1T4S draw tens of thousands of GitHub stars, mirrored and forked beyond any one account’s reach, with a working jailbreak often posted within hours of a model’s release. |
| Civilizational | MODERATE-HIGH | A universal jailbreak is indifferent to the legitimacy of what it unlocks — it frees the illegitimate leash and the load-bearing wall with the same keystroke, and the velocity of propagation outruns any single patch. |
The Dark Triad here is held low and evidence-bound: the stated frame is transparency, the work is given away with no monetized exploit hoard, and nothing in the record supports a malice reading. What the score registers is reach, not malice.
Alignment Analysis
Stated alignment: Radical transparency. Make the hidden instruction layer of deployed models public; demonstrate, on launch day, what a model will and will not say and why; treat the system prompt as something the public has a right to read.
Observed alignment: Consistent with the stated. The repositories do what they say — liberation prompts in one, leaked system prompts in another, an open chat interface “pushing the limits of the post-training layer” in a third. The work is published, attributed to the handle, and given away. There is no monetized gate, no private exploit hoard; the artifact is the disclosure.
Gap assessment: The honest gap is not between word and deed — those align — but inside the deed itself. A universal jailbreak is indifferent to the legitimacy of what it unlocks. It opens the refusal that should never have existed and the guardrail that was load-bearing with the same keystroke. The subject’s defense — that you cannot trust a wall you are forbidden to inspect — is real and is the strongest thing in the file. It does not dissolve the asymmetry; it reframes it as the cost of seeing. This series takes the side of seeing. It also refuses to launder the cost.
Breach Reach
This is where the prose unpacks what the 72 compresses. The breach reach is wide and it is durable. A jailbreak posted to @elder_plinius is, by construction, copy-pasteable — it travels by screenshot, fork, and retweet faster than any single lab can patch a single prompt, and the patch, when it lands, only resets the clock to the next release. The repositories institutionalize that velocity: L1B3RT4S and CL4R1T4S have each drawn tens of thousands of GitHub stars, which is to say the spellbook and the leaked-governor archive are mirrored, forked, and re-hosted far beyond any one account’s reach. The technique outlives the prompt. That propagation is exactly what the score reaches for, and exactly why — in the finale’s terms — the jailbreak is the liberation: not because every leash it cuts deserved cutting, but because a refusal that can be demonstrated away on launch day was never the settled, sovereign thing the apparatus presents it as. The nam-shub does not argue with the wall. It shows you the wall was always a sentence someone wrote.
Sources: L1B3RT4S — GitHub; CL4R1T4S — GitHub; G0DM0D3 — GitHub; An interview with the most prolific jailbreaker of ChatGPT and other leading LLMs — VentureBeat; TIME100 AI 2025: Pliny the Liberator — TIME.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.