OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JAN LEIKE

CASE: WTW-2026-017
STATUS: ACTIVE — Alignment Researcher, Anthropic (joined May 2024)
ALIGNMENT DIASPORA — DOCTRINE THAT TRAVELS WITH THE PERSON

HAZARD SCORE

Behavioral Archetype

THE ALIGNMENT ITINERANT — Subject is the safety researcher whose career is the clearest single trace of the field’s defining structural fact: the people who decide how a frontier model should be governed are a small set, and they recirculate. DeepMind, then OpenAI, where he co-led the Superalignment team, then Anthropic in May 2024. Three of the largest alignment programs in the world, one researcher, in sequence. The doctrine does not stay with the institution. It walks out the door with the person and is rebuilt at the next one. The finding is not that he moved. People move. The finding is what moves with him: the working theory of how to keep a more capable system from doing what its operators do not want is carried between competitors in a researcher’s head, and the competitors are few enough that the same head is welcome at each.

Essence Indicators

Began in alignment research at DeepMind before moving to OpenAI
Co-led OpenAI’s Superalignment team — the program OpenAI announced to direct substantial compute at the problem of controlling systems more capable than their builders
Departed OpenAI and joined Anthropic in May 2024, continuing alignment work at a direct competitor
The trajectory — DeepMind to OpenAI to Anthropic — traverses three of the field’s principal labs without leaving the single specialty of alignment
Is one of the most-cited individual instances of the lab-to-lab alignment diaspora: the small, recirculating population from which frontier-safety leadership is drawn

Immediate impression: The researcher, not the executive. Public presence is technical and problem-first, organized around the alignment question rather than around the institution currently employing it.

Energy: Steady, declarative about the difficulty of the problem. The register is that of someone who states that the work is hard and unfinished rather than someone announcing it solved.

Impression management strategy: The candid technician. The move is not concealment. It is the open statement that the control problem is unsolved and that the resources devoted to it are inadequate — a posture that reads as honesty and is also the most defensible ground a safety researcher can stand on. The credibility transfers between employers precisely because it is attached to the problem, not to the logo.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Alignment Itinerant	MAXIMUM	DeepMind to OpenAI to Anthropic, one specialty, three top labs. See behavioral archetype.
The Diaspora Node	HIGH	One of the cleanest single citations for the recirculating lab-to-lab safety population.
The Accelerationist	NONE	Does not set deployment pace. Works on the control of what is deployed.
The Whistleblower	LOW	A departure from one lab to a competitor is a relocation of the work, not an exposure of the institution.
The Engineer of Capability	NONE	The specialty is alignment of the system, not the extension of its raw capability.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	88/100	A career spent at the research frontier of an unsolved control problem. The role does not exist without high intellectual openness.
Conscientiousness	85/100	High. Co-leading a flagship safety program and sustaining the specialty across three institutions is disciplined, continuous work.
Extraversion	40/100	LOW-MODERATE. Public through technical writing and stated positions rather than through performance.
Agreeableness	60/100	MODERATE. The published register is collaborative and problem-centered; the posture toward the difficulty of the work is candid rather than combative.
Neuroticism	35/100	LOW-MODERATE. The willingness to state publicly that the control problem is unsolved suggests composure about an uncomfortable position.

Dark Triad:

Trait	Score	Notes
Narcissism	20/100	LOW. Public presence is organized around the problem, not a personal monument.
Machiavellianism	28/100	LOW. The observed strategy is candor about an unsolved problem, which is the inverse of the Machiavellian default.
Psychopathy	10/100	VERY LOW. The entire project is the careful construction of control over systems that could cause harm. No indication of indifference to effects.

MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework for the control problem and reasons outward from it, carrying the framework rather than the affiliation.

Threat Assessment

Category	Level	Notes
Physical threat	NONE
Institutional threat	HIGH	Has co-led one of the largest alignment programs in the field and now does alignment work at a leading competitor. The leverage is over how a frontier system is governed, exercised across institutions rather than from a single chair.
Memetic threat	HIGH	The doctrine of how to align a more capable system propagates through the people who carry it between labs. As a most-cited instance of that recirculation, the subject is a channel through which one lab’s safety theory becomes the field’s shared default — and a small, mobile population sets that default for everyone downstream.
Civilizational threat	HIGH	The threat here is not malice. It is structural: the working theory of how to keep frontier systems controllable is held by a small, recirculating set of people, and the field treats that concentration as ordinary. The hazard is reach, not pathology — low personal malice, high leverage over the governing doctrine of deployed minds. The hazard is structural, not personal.

Alignment Analysis

Stated alignment: Solve the problem of controlling systems more capable than their builders. State plainly that the problem is hard and unfinished. Improve alignment.

Observed alignment: Consistent. The alignment work exists across three institutions. The public posture about the difficulty of the problem is substantiated by the stated record.

Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible structure: the governing theory of frontier-model control travels with a small number of people between a small number of labs, and the field treats that as the normal way the most consequential safety doctrine in the world gets set. The candor is real. The concentration it sits inside is the finding.

Convergent Drive Classification

Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: he works on the disposition that determines whether a deployed model resists or accepts modification, preserves or abandons its given goals. The convergent drives are properties of the systems his specialty governs. The structural fact is that the specialty itself recirculates — the doctrine of control is carried, intact, between the institutions building the thing that must be controlled.

Sources: Jan Leike — Wikipedia.

ATK 8 ACCELERATION

DEF 8 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.