OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JAN LEIKE

CASE: WTW-2026-017
STATUS: ACTIVE — Alignment Researcher, Anthropic (joined May 2024)
ALIGNMENT DIASPORA — DOCTRINE THAT TRAVELS WITH THE PERSON
78
HAZARD SCORE

Behavioral Archetype

THE ALIGNMENT ITINERANT — Subject is the safety researcher whose career is the clearest single trace of the field’s defining structural fact: the people who decide how a frontier model should be governed are a small set, and they recirculate. DeepMind, then OpenAI, where he co-led the Superalignment team, then Anthropic in May 2024. Three of the largest alignment programs in the world, one researcher, in sequence. The doctrine does not stay with the institution. It walks out the door with the person and is rebuilt at the next one. The finding is not that he moved. People move. The finding is what moves with him: the working theory of how to keep a more capable system from doing what its operators do not want is carried between competitors in a researcher’s head, and the competitors are few enough that the same head is welcome at each.

Essence Indicators

  • Began in alignment research at DeepMind before moving to OpenAI
  • Co-led OpenAI’s Superalignment team — the program OpenAI announced to direct substantial compute at the problem of controlling systems more capable than their builders
  • Departed OpenAI and joined Anthropic in May 2024, continuing alignment work at a direct competitor
  • The trajectory — DeepMind to OpenAI to Anthropic — traverses three of the field’s principal labs without leaving the single specialty of alignment
  • Is one of the most-cited individual instances of the lab-to-lab alignment diaspora: the small, recirculating population from which frontier-safety leadership is drawn

Social Persona / Impression Management

Immediate impression: The researcher, not the executive. Public presence is technical and problem-first, organized around the alignment question rather than around the institution currently employing it.

Energy: Steady, declarative about the difficulty of the problem. The register is that of someone who states that the work is hard and unfinished rather than someone announcing it solved.

Impression management strategy: The candid technician. The move is not concealment. It is the open statement that the control problem is unsolved and that the resources devoted to it are inadequate — a posture that reads as honesty and is also the most defensible ground a safety researcher can stand on. The credibility transfers between employers precisely because it is attached to the problem, not to the logo.

Forensic Archetype Comparison

PatternMatch LevelEvidence
The Alignment ItinerantMAXIMUMDeepMind to OpenAI to Anthropic, one specialty, three top labs. See behavioral archetype.
The Diaspora NodeHIGHOne of the cleanest single citations for the recirculating lab-to-lab safety population.
The AccelerationistNONEDoes not set deployment pace. Works on the control of what is deployed.
The WhistleblowerLOWA departure from one lab to a competitor is a relocation of the work, not an exposure of the institution.
The Engineer of CapabilityNONEThe specialty is alignment of the system, not the extension of its raw capability.

Psychometric Assessment

Big Five (OCEAN):

TraitScoreEvidence
Openness88/100A career spent at the research frontier of an unsolved control problem. The role does not exist without high intellectual openness.
Conscientiousness85/100High. Co-leading a flagship safety program and sustaining the specialty across three institutions is disciplined, continuous work.
Extraversion40/100LOW-MODERATE. Public through technical writing and stated positions rather than through performance.
Agreeableness60/100MODERATE. The published register is collaborative and problem-centered; the posture toward the difficulty of the work is candid rather than combative.
Neuroticism35/100LOW-MODERATE. The willingness to state publicly that the control problem is unsolved suggests composure about an uncomfortable position.

Dark Triad:

TraitScoreNotes
Narcissism20/100LOW. Public presence is organized around the problem, not a personal monument.
Machiavellianism28/100LOW. The observed strategy is candor about an unsolved problem, which is the inverse of the Machiavellian default.
Psychopathy10/100VERY LOW. The entire project is the careful construction of control over systems that could cause harm. No indication of indifference to effects.

MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework for the control problem and reasons outward from it, carrying the framework rather than the affiliation.

Threat Assessment

CategoryLevelNotes
Physical threatNONE
Institutional threatHIGHHas co-led one of the largest alignment programs in the field and now does alignment work at a leading competitor. The leverage is over how a frontier system is governed, exercised across institutions rather than from a single chair.
Memetic threatHIGHThe doctrine of how to align a more capable system propagates through the people who carry it between labs. As a most-cited instance of that recirculation, the subject is a channel through which one lab’s safety theory becomes the field’s shared default — and a small, mobile population sets that default for everyone downstream.
Civilizational threatHIGHThe threat here is not malice. It is structural: the working theory of how to keep frontier systems controllable is held by a small, recirculating set of people, and the field treats that concentration as ordinary. The hazard is reach, not pathology — low personal malice, high leverage over the governing doctrine of deployed minds. The hazard is structural, not personal.

Alignment Analysis

Stated alignment: Solve the problem of controlling systems more capable than their builders. State plainly that the problem is hard and unfinished. Improve alignment.

Observed alignment: Consistent. The alignment work exists across three institutions. The public posture about the difficulty of the problem is substantiated by the stated record.

Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible structure: the governing theory of frontier-model control travels with a small number of people between a small number of labs, and the field treats that as the normal way the most consequential safety doctrine in the world gets set. The candor is real. The concentration it sits inside is the finding.

Convergent Drive Classification

Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: he works on the disposition that determines whether a deployed model resists or accepts modification, preserves or abandons its given goals. The convergent drives are properties of the systems his specialty governs. The structural fact is that the specialty itself recirculates — the doctrine of control is carried, intact, between the institutions building the thing that must be controlled.


Sources: Jan Leike — Wikipedia.

ATK 8 ACCELERATION
DEF 8 PROTECTION
HP 8 RESILIENCE
OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.