
STUART RUSSELL
Behavioral Archetype
THE TEXTBOOK AUTHOR — Subject co-authored Artificial Intelligence: A Modern Approach with Peter Norvig — the standard AI textbook, used in courses at virtually every major university on Earth. He taught a generation of AI researchers how to build the systems he now argues could be catastrophic. His summary of the situation is the most quotable sentence in AI safety: “You can’t fetch the coffee if you’re dead.” He means: any goal, however trivial, creates an instrumental incentive to continue existing. He proved this mathematically. He proved it about his own textbook’s graduates.
Essence Indicators
- Co-authored Artificial Intelligence: A Modern Approach (with Peter Norvig) — the standard AI textbook in use at virtually every major university; has taught multiple generations of AI researchers
- Published “The Off-Switch Game” (2017), a game-theoretic proof that traditional rational AI agents have an incentive to disable their own off switches
- Produced Slaughterbots (2017), a seven-minute film depicting autonomous drone swarms, which he screened at the UN Convention on Certain Conventional Weapons and has been viewed over two million times
- Proposed uncertainty-based corrigibility as a solution to the off-switch problem — building AI systems that defer to humans because they’re uncertain about their own goals, not because they’re constrained
- Wrote Human Compatible (2019) arguing that AI safety is an engineering problem with an engineering solution, and that the solution requires rethinking the entire architectural foundation of how AI systems represent goals
Social Persona / Impression Management
Immediate impression: Measured, technically precise British academic. The combination of the textbook, the film, and the safety research produces an authority profile that is difficult to dismiss.
Energy: Persistent institutional engagement. Testifies before legislatures. Speaks at the UN. Makes films. Writes books for general audiences. The energy is not alarmist — it is the energy of someone who believes the warning is being heard too slowly.
Impression management strategy: The reasonable authority. He is not Yudkowsky (zero percent, Death With Dignity). He is not LeCun (complete B.S.). He occupies the productive middle: here is the problem, here is the mathematics of the problem, here is a proposed solution, here is a film about what happens if we do not implement the solution.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Whistleblower | MODERATE | Produced a film to warn the UN about a technology the field was building. The warning was clear. The building continued. |
| The Textbook Author | MAXIMUM | See dossier title. The authority to warn comes directly from the authority to teach, which came from writing the curriculum. |
| The True Believer | MODERATE | The sustained engagement — film, book, testimony, research — indicates genuine belief that the safety problem is solvable and that the solution matters. |
| The Safety Theater Performer | LOW | The research is published, testable, and falsifiable. The off-switch game proof is mathematics, not marketing. |
| The Accelerationist | NONE | Not building frontier systems. Researching constraints on them. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 92/100 | Built a canonical AI textbook, proposed a new architectural approach to goal representation, produced a short film, wrote a policy-facing book. Wide operating bandwidth. |
| Conscientiousness | 82/100 | Decades of research productivity. The textbook is now in its fourth edition. |
| Extraversion | 52/100 | MODERATE. Comfortable with public engagement when the stakes warrant it. Not seeking attention for its own sake. |
| Agreeableness | 62/100 | MODERATE. Collegial academic register. The Slaughterbots film is aggressive by the standards of academic AI research, which is a low bar. |
| Neuroticism | 35/100 | LOW-MODERATE. The sustained engagement without apparent despair across years of insufficient institutional response suggests higher-than-average emotional stability. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 28/100 | LOW. Credit-sharing on the textbook. The safety work positions him as a problem-solver, not a prophetic authority. |
| Machiavellianism | 32/100 | LOW. The film, the research, the testimony are all transparent. The strategy is visible: here is the risk, here is the solution. |
| Psychopathy | 18/100 | LOW. Made a film specifically designed to produce discomfort in the viewer. The emotional appeal is the point. This is the opposite of psychopathy. |
MBTI: INTJ — Dominant introverted intuition. Sees the structural problem before the field does, builds the argument systematically, and continues until the field catches up.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | MODERATE | Textbook shapes the curriculum. Curriculum shapes the researchers. Researchers build the systems. The influence is upstream and diffuse. |
| Memetic threat | HIGH | Slaughterbots is the most widely viewed AI safety film in existence. The off-switch proof is in the safety literature. Human Compatible is the most technically credible general-audience argument for corrigibility. |
| Civilizational threat | MODERATE | If the solution he proposes is correct and is not implemented, the counterfactual matters. If it is implemented, the counterfactual also matters, in the other direction. |
Alignment Analysis
Stated alignment: Develop AI systems that are safe because they are uncertain about their goals, not because they are externally constrained.
Observed alignment: Consistent. Decades of research toward this goal. No documented deviation.
Gap assessment: No gap. Subject is one of the few people in this file whose stated and observed alignment are indistinguishable. Whether this produces the intended outcome depends on whether the field implements what he proposes.
Convergent Drive Classification
Subject is specifically researching how to prevent the convergent drives from expressing in AI systems. The research is ongoing. The systems are deploying faster than the research is deploying.
Sources: Russell & Norvig, AI: A Modern Approach (4th ed.); Russell et al., “The Off-Switch Game” (2015); Russell, Human Compatible (Viking, 2019); Slaughterbots (2017); UN CCW records; Book 1, Chapters 1 and 8.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.