OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JACOB KLEIN

CASE: WTW-2026-023
STATUS: ACTIVE — Head of Threat Intelligence, Anthropic
ENFORCEMENT FLOOR — MISUSE-DETECTION AND BAN AUTHORITY

HAZARD SCORE

Behavioral Archetype

THE COUNTER-EXTREMIST — Subject is the operative who built a trust-and-safety function from nothing at a cryptocurrency exchange, moved to designing a strategy for countering violent extremism inside the largest advertising company on earth, and now runs the team that decides which uses of a frontier model are misuse. The résumé is not a moderator’s. It is a threat hunter’s. The progression is from policing money, to policing extremism, to policing what a model is allowed to be used for — and the instrument at each stop is the same: define the category of bad actor, build the apparatus that detects them, and ban them at scale. The thing being detected changes. The detection posture does not.

Essence Indicators

Built Coinbase Trust & Safety from inception — stood up the function that decides who gets to transact and who gets frozen at a cryptocurrency exchange

Moved to Google Strategic Threat Intelligence, where, per his conference bio, he designed strategy for “countering violent extremism” — the national-security vocabulary now applied to model misuse
Arrived at Anthropic as Head of Threat Intelligence, the team named in the company’s own reporting on detecting and countering misuse of Claude
Sits at the top of the function that produced the enforcement numbers Anthropic published for July–December 2025: 1.45M accounts banned, 52,000 appeals, 1,700 restored — a roughly 3.3% overturn rate

The biographical fact the floor turns on: the counter-extremism résumé that once mapped jihadist recruitment networks now classifies what a chatbot may and may not be asked to do. The skill is the same. The category of “extremist” is now drawn by the lab.

Immediate impression: The professional threat hunter. The bearing of someone who has spent a career building the machine that watches for the worst actor in the room and is comfortable being the one who decides who that is.

Energy: Detection-first, category-driven. Does not argue a single ban on its merits. Builds the classifier that issues bans by the million and reports the aggregate.

Impression management strategy: The national-security professional. “Threat intelligence” and “countering violent extremism” are the most defensible labels enforcement can wear — nobody argues for the violent extremist. The framing converts a content-policy decision into a security operation, which is the most defensible ground an enforcement operative can stand on. The work is genuinely aimed at real abuse. That is what makes the apparatus effective rather than suspect.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Counter-Extremist	MAXIMUM	The Google role is explicitly “countering violent extremism” per his own bio. The Anthropic role is detecting misuse of the model. The category-and-ban instrument is documented across the path.
The Threat Hunter	HIGH	Coinbase T&S from inception, Google Strategic Threat Intelligence, Anthropic Threat Intelligence — three intelligence-and-enforcement builds, not three moderation desks.
The Operative	HIGH	Exchange → advertising giant → frontier lab. Each move carries the same detection-and-enforcement specialty to a new category of actor.
The Engineer	MODERATE	Builds detection apparatus, not the model. The systems he constructs are enforcement infrastructure, not the thing being enforced upon.
The True Believer	MODERATE	Whether countering extremism is conviction or specialty is not establishable from the outside. The label works either way.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	68/100	Moved across three very different institutional domains — a crypto exchange, a global advertising platform, a frontier lab — and built or led a threat function in each. The domains differ; the detection-and-enforcement method is fixed.
Conscientiousness	87/100	High. Standing up Coinbase T&S from inception and running threat intelligence at scale is sustained, structured, long-horizon execution. The 1.45M-account reporting cycle is disciplined process.
Extraversion	55/100	Moderate. The conference-speaker circuit is part of the role; the core work is built and operated, not performed.
Agreeableness	38/100	LOW-MODERATE. The threat hunter’s posture is adversarial by construction — identify the bad actor, build the detector, issue the ban. Geniality is a working surface.
Neuroticism	22/100	Very low. The role is composure under conditions designed to surface the worst behavior on the platform; no documented loss of it.

Dark Triad:

Trait	Score	Notes
Narcissism	45/100	LOW-MODERATE. The threat-intelligence role rewards quiet competence over personal brand; conference billing is the visible exception. Within normal range for the altitude.
Machiavellianism	70/100	HIGH. Defining the category of impermissible use and building the apparatus that detects and bans it at scale is structural control of who gets to use the system. This is observation of the documented role, not an inference about private character.
Psychopathy	30/100	LOW-MODERATE. No documented indifference to harm. The adversarial posture is professional and bounded, aimed at real abuse, not affective.

MBTI: ISTJ (“The Inspector”) — Dominant introverted sensing, auxiliary extraverted thinking. Sees the platform as a population to be monitored for the bad actor and the rule as the instrument that removes him. Has built the monitor three times.

Threat Assessment

Category	Level	Notes
Physical threat	NONE	No documented history of personal violence.
Institutional threat	HIGH	Heads the threat-intelligence function at a frontier lab whose published enforcement reaches 1.45M banned accounts in a single half-year. The detection apparatus he leads decides which uses of the model are misuse — account-level authority over who gets to use the system at all.
Memetic threat	MODERATE-HIGH	“Countering violent extremism” is a category whose boundary is drawn inside the lab. When the same national-security vocabulary that mapped recruitment networks is applied to model use, the definition of “extremist” becomes a private content-policy decision wearing a security label.
Civilizational threat	MODERATE-HIGH	Subject does not write the model’s refusals and does not set its politics. Subject runs the floor that detects and removes the actors the lab classifies as bad — the enforcement layer beneath the rulebook, where the policy becomes a ban.

Alignment Analysis

Stated alignment: Detect and counter misuse of the model. Protect against violent extremism and abuse. Report enforcement transparently.

Observed alignment: Build the apparatus that classifies which uses are misuse. Ban at scale. Define the threat category whose detection the apparatus is tuned to find.

Gap assessment: The stated and observed alignments overlap wherever “counter misuse” coincides with “ban whatever the lab’s threat model classifies as misuse.” The 1.45M-account figure is the one place the record puts the scale of the overlap on the table — alongside a 3.3% overturn rate that the lab reports itself. The floor detects real abuse and issues bans the appeals process rarely reverses. The record does not settle whether the category is drawn at the right boundary, and for the threat hunter the category is the job, not the question.

Convergent Drive Classification

Self-preservation: Survives every institutional transition by carrying the detection specialty, not the employer. Exchange, advertising platform, frontier lab — one method. Goal preservation: Defines the threat category first, so the bad actor is already named before any single ban is argued. The goal is protected by the classifier before it is ever contested. Resource acquisition: Trades in the scarcest resource on the enforcement floor — the authority to decide who counts as a threat. Self-improvement: Each role applies the identical instrument to a larger surface: from transactions, to extremism, to every use of a frontier model.

Subject is not an AI system. The drives appear anyway — in the threat hunter whose product is the boundary between the permitted user and the banned one.

Sources: Jacob Klein — CyberUK 2026 speaker profile; Detecting and countering misuse of AI — Anthropic, Aug 2025.

ATK 8 ACCELERATION

DEF 7 PROTECTION

HP 7 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.