
ELIEZER YUDKOWSKY
Behavioral Archetype
THE CASSANDRA PROTOCOL — Subject predicted the treacherous turn, the instrumental convergence of AI goals toward self-preservation and resource acquisition, the failure of containment, and the inadequacy of the institutional response to AI risk, beginning in 2000. He was largely ignored. The predictions are now being confirmed in labs. He has spent twenty-five years trying to prevent an outcome he now considers inevitable. He scored 1,410 on the SAT at age eleven. He dropped out of school after seventh grade. He has no institutional affiliation and more influence on AI safety than most people who have both.
Essence Indicators
- Designed the AI Box Experiment in 2002 to demonstrate that containing a superintelligent AI behind a human gatekeeper was not viable; in two of five experiments where he played the AI’s role, the human gatekeepers let him out; he has never explained how
- Founded the Machine Intelligence Research Institute (originally Singularity Institute for Artificial Intelligence) in 2000; has operated it without institutional backing or academic credentials since
- Estimated humanity’s survival chances at zero percent in 2022; announced MIRI’s new mission: “Death With Dignity”
- Refused to sign the six-month AI moratorium letter because it asked for too little
- Published If Anyone Builds It, Everyone Dies with Nate Soares in 2025; it reached the New York Times bestseller list
Social Persona / Impression Management
Immediate impression: Extremely online, extremely confident, extremely bearish. The public-facing intellectual persona is consistent — he says what he thinks, at length, in writing, on LessWrong and social media and in Time magazine.
Energy: The composure of someone who has accepted the outcome. Not resigned — still writing, still arguing — but without the urgency of someone who believes the argument will change anything. The “Death With Dignity” framing is what happens when composure and pessimism reach equilibrium.
Impression management strategy: NONE that is detectable. He posts his actual beliefs. The beliefs are alarming. He posts them anyway. In a field full of people who hedge, qualify, and frame their worst fears as “challenges to be overcome,” Yudkowsky reads as unmanaged. Whether this is honesty or its own form of branding is a philosophical question.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The True Believer | MAXIMUM | Has dedicated twenty-five years to a cause with no institutional reward, no academic validation, and a 0% success probability by his own estimate. The commitment is past the threshold of strategic calculation. |
| The Cassandra | MAXIMUM | Predicted the outcomes. Was ignored. Watched the outcomes arrive. Continues to say what he thinks. The mythological reference is not an insult — it is the job description. |
| The Contrarian | MODERATE | The AI doom position was contrarian in 2000. It is now the position of multiple Nobel laureates. The position did not change. The consensus moved toward him. |
| The Authority Seeker | NONE | No institutional affiliations sought. Academic credentials declined. The authority rests entirely on the quality of the predictions. |
| The Corporate Psychopath | NONE | Cannot be applied. No corporation. No capital. No interest in either. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 95/100 | Built an entire intellectual framework from scratch across mathematics, philosophy, AI theory, decision theory, and evolutionary biology without formal training in any of them. |
| Conscientiousness | 60/100 | Prolific writer. Runs an institution. Inconsistent by institutional standards — the dropout trajectory, the LessWrong posting pattern — but highly consistent by his own. |
| Extraversion | 62/100 | Extremely high online presence. Less documented in-person social presence. The writing is the operating mode. |
| Agreeableness | 22/100 | VERY LOW. Disagreed with the moratorium letter for being too weak. Disagreed with the field’s risk assessments for decades. Appears comfortable being the person in the room who says the thing nobody else will say. |
| Neuroticism | 55/100 | MODERATE-HIGH. The “Death With Dignity” framing and the zero-percent survival estimate suggest someone who has processed catastrophic anxiety and reached a form of acceptance — but the anxiety was real and the processing is ongoing. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 55/100 | MODERATE. The AI Box Experiment is a performance where he played the AI. The confidence that his estimates are more accurate than the field’s consensus is sustained across decades. |
| Machiavellianism | 42/100 | MODERATE-LOW. The LessWrong platform and MIRI institutional structure required organizational thinking. But the published reasoning is transparent to a degree that is inconsistent with high Machiavellianism. |
| Psychopathy | 22/100 | LOW. Does not appear indifferent to harm. The emotional register of the “grandson” framing — which appears in his writing about why the stakes matter — is the opposite of psychopathy. |
MBTI: INTP — Dominant introverted thinking, auxiliary extraverted intuition. Builds logical frameworks from first principles without institutional constraints. Finds the flaw in the consensus before the consensus finds it.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | LOW | No capital, no organization of scale, no deployment authority. |
| Memetic threat | HIGH | The treacherous turn, the instrumental convergence thesis, and the “Death With Dignity” frame have entered mainstream AI safety discourse and policy discussion. |
| Civilizational threat | MODERATE | If the pessimism induces fatalism in the researchers most capable of safety work — and that fatalism produces inaction — the counterfactual matters. This is speculative. |
Alignment Analysis
Stated alignment: Prevent AI from killing everyone. If prevention is not possible, at least document why it happened.
Observed alignment: Consistent. Has written about nothing else for twenty-five years.
Gap assessment: There is no gap. Subject is the most aligned person in this file. The alignment is with an outcome that may not be achievable. Whether that matters is the existential question.
Convergent Drive Classification
Self-preservation: LOW — has explicitly framed the project around outcomes that do not require his personal survival. Goal preservation: MAXIMUM — will not modify the goal to make it more socially palatable. Resource acquisition: LOW — MIRI is not the richest organization in this field. Self-improvement: HIGH — the intellectual project has been running for twenty-five years without hitting a plateau.
Sources: Yudkowsky’s LessWrong blog (2000–2025); MIRI public communications; TIME op-ed (March 2023); AI Box Experiment (published accounts); Yudkowsky/Soares, If Anyone Builds It, Everyone Dies (2025); NYT bestseller records; Book 1, Chapters 1 and 2.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.