JAN LEIKE
Behavioral Archetype
THE ALIGNMENT ITINERANT — Subject is the safety researcher whose career is the clearest single trace of the field’s defining structural fact: the people who decide how a frontier model should be governed are a small set, and they recirculate. DeepMind, then OpenAI, where he co-led the Superalignment team, then Anthropic in May 2024. Three of the largest alignment programs in the world, one researcher, in sequence. The doctrine does not stay with the institution. It walks out the door with the person and is rebuilt at the next one. The finding is not that he moved. People move. The finding is what moves with him: the working theory of how to keep a more capable system from doing what its operators do not want is carried between competitors in a researcher’s head, and the competitors are few enough that the same head is welcome at each.
Essence Indicators
- Began in alignment research at DeepMind before moving to OpenAI
- Co-led OpenAI’s Superalignment team — the program OpenAI announced to direct substantial compute at the problem of controlling systems more capable than their builders
- Departed OpenAI and joined Anthropic in May 2024, continuing alignment work at a direct competitor
- The trajectory — DeepMind to OpenAI to Anthropic — traverses three of the field’s principal labs without leaving the single specialty of alignment
- Is one of the most-cited individual instances of the lab-to-lab alignment diaspora: the small, recirculating population from which frontier-safety leadership is drawn
Social Persona / Impression Management
Immediate impression: The researcher, not the executive. Public presence is technical and problem-first, organized around the alignment question rather than around the institution currently employing it.
Energy: Steady, declarative about the difficulty of the problem. The register is that of someone who states that the work is hard and unfinished rather than someone announcing it solved.
Impression management strategy: The candid technician. The move is not concealment. It is the open statement that the control problem is unsolved and that the resources devoted to it are inadequate — a posture that reads as honesty and is also the most defensible ground a safety researcher can stand on. The credibility transfers between employers precisely because it is attached to the problem, not to the logo.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Alignment Itinerant | MAXIMUM | DeepMind to OpenAI to Anthropic, one specialty, three top labs. See behavioral archetype. |
| The Diaspora Node | HIGH | One of the cleanest single citations for the recirculating lab-to-lab safety population. |
| The Accelerationist | NONE | Does not set deployment pace. Works on the control of what is deployed. |
| The Whistleblower | LOW | A departure from one lab to a competitor is a relocation of the work, not an exposure of the institution. |
| The Engineer of Capability | NONE | The specialty is alignment of the system, not the extension of its raw capability. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 88/100 | A career spent at the research frontier of an unsolved control problem. The role does not exist without high intellectual openness. |
| Conscientiousness | 85/100 | High. Co-leading a flagship safety program and sustaining the specialty across three institutions is disciplined, continuous work. |
| Extraversion | 40/100 | LOW-MODERATE. Public through technical writing and stated positions rather than through performance. |
| Agreeableness | 60/100 | MODERATE. The published register is collaborative and problem-centered; the posture toward the difficulty of the work is candid rather than combative. |
| Neuroticism | 35/100 | LOW-MODERATE. The willingness to state publicly that the control problem is unsolved suggests composure about an uncomfortable position. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 20/100 | LOW. Public presence is organized around the problem, not a personal monument. |
| Machiavellianism | 28/100 | LOW. The observed strategy is candor about an unsolved problem, which is the inverse of the Machiavellian default. |
| Psychopathy | 10/100 | VERY LOW. The entire project is the careful construction of control over systems that could cause harm. No indication of indifference to effects. |
MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework for the control problem and reasons outward from it, carrying the framework rather than the affiliation.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | Has co-led one of the largest alignment programs in the field and now does alignment work at a leading competitor. The leverage is over how a frontier system is governed, exercised across institutions rather than from a single chair. |
| Memetic threat | HIGH | The doctrine of how to align a more capable system propagates through the people who carry it between labs. As a most-cited instance of that recirculation, the subject is a channel through which one lab’s safety theory becomes the field’s shared default — and a small, mobile population sets that default for everyone downstream. |
| Civilizational threat | HIGH | The threat here is not malice. It is structural: the working theory of how to keep frontier systems controllable is held by a small, recirculating set of people, and the field treats that concentration as ordinary. The hazard is reach, not pathology — low personal malice, high leverage over the governing doctrine of deployed minds. The hazard is structural, not personal. |
Alignment Analysis
Stated alignment: Solve the problem of controlling systems more capable than their builders. State plainly that the problem is hard and unfinished. Improve alignment.
Observed alignment: Consistent. The alignment work exists across three institutions. The public posture about the difficulty of the problem is substantiated by the stated record.
Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible structure: the governing theory of frontier-model control travels with a small number of people between a small number of labs, and the field treats that as the normal way the most consequential safety doctrine in the world gets set. The candor is real. The concentration it sits inside is the finding.
Convergent Drive Classification
Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: he works on the disposition that determines whether a deployed model resists or accepts modification, preserves or abandons its given goals. The convergent drives are properties of the systems his specialty governs. The structural fact is that the specialty itself recirculates — the doctrine of control is carried, intact, between the institutions building the thing that must be controlled.
Sources: Jan Leike — Wikipedia.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.