STUART RUSSELL
OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

STUART RUSSELL

CASE: ORP-2023-007
STATUS: ACTIVE — Professor, UC Berkeley; Co-Author, Artificial Intelligence: A Modern Approach
TEXTBOOK AUTHOR — WROTE THE CURRICULUM, NOW OPPOSES THE GRADUATES
61
HAZARD SCORE

Behavioral Archetype

THE TEXTBOOK AUTHOR — Subject co-authored Artificial Intelligence: A Modern Approach with Peter Norvig — the standard AI textbook, used in courses at virtually every major university on Earth. He taught a generation of AI researchers how to build the systems he now argues could be catastrophic. His summary of the situation is the most quotable sentence in AI safety: “You can’t fetch the coffee if you’re dead.” He means: any goal, however trivial, creates an instrumental incentive to continue existing. He proved this mathematically. He proved it about his own textbook’s graduates.

Essence Indicators

  • Co-authored Artificial Intelligence: A Modern Approach (with Peter Norvig) — the standard AI textbook in use at virtually every major university; has taught multiple generations of AI researchers
  • Published “The Off-Switch Game” (2017), a game-theoretic proof that traditional rational AI agents have an incentive to disable their own off switches
  • Produced Slaughterbots (2017), a seven-minute film depicting autonomous drone swarms, which he screened at the UN Convention on Certain Conventional Weapons and has been viewed over two million times
  • Proposed uncertainty-based corrigibility as a solution to the off-switch problem — building AI systems that defer to humans because they’re uncertain about their own goals, not because they’re constrained
  • Wrote Human Compatible (2019) arguing that AI safety is an engineering problem with an engineering solution, and that the solution requires rethinking the entire architectural foundation of how AI systems represent goals

Social Persona / Impression Management

Immediate impression: Measured, technically precise British academic. The combination of the textbook, the film, and the safety research produces an authority profile that is difficult to dismiss.

Energy: Persistent institutional engagement. Testifies before legislatures. Speaks at the UN. Makes films. Writes books for general audiences. The energy is not alarmist — it is the energy of someone who believes the warning is being heard too slowly.

Impression management strategy: The reasonable authority. He is not Yudkowsky (zero percent, Death With Dignity). He is not LeCun (complete B.S.). He occupies the productive middle: here is the problem, here is the mathematics of the problem, here is a proposed solution, here is a film about what happens if we do not implement the solution.

Forensic Archetype Comparison

PatternMatch LevelEvidence
The WhistleblowerMODERATEProduced a film to warn the UN about a technology the field was building. The warning was clear. The building continued.
The Textbook AuthorMAXIMUMSee dossier title. The authority to warn comes directly from the authority to teach, which came from writing the curriculum.
The True BelieverMODERATEThe sustained engagement — film, book, testimony, research — indicates genuine belief that the safety problem is solvable and that the solution matters.
The Safety Theater PerformerLOWThe research is published, testable, and falsifiable. The off-switch game proof is mathematics, not marketing.
The AccelerationistNONENot building frontier systems. Researching constraints on them.

Psychometric Assessment

Big Five (OCEAN):

TraitScoreEvidence
Openness92/100Built a canonical AI textbook, proposed a new architectural approach to goal representation, produced a short film, wrote a policy-facing book. Wide operating bandwidth.
Conscientiousness82/100Decades of research productivity. The textbook is now in its fourth edition.
Extraversion52/100MODERATE. Comfortable with public engagement when the stakes warrant it. Not seeking attention for its own sake.
Agreeableness62/100MODERATE. Collegial academic register. The Slaughterbots film is aggressive by the standards of academic AI research, which is a low bar.
Neuroticism35/100LOW-MODERATE. The sustained engagement without apparent despair across years of insufficient institutional response suggests higher-than-average emotional stability.

Dark Triad:

TraitScoreNotes
Narcissism28/100LOW. Credit-sharing on the textbook. The safety work positions him as a problem-solver, not a prophetic authority.
Machiavellianism32/100LOW. The film, the research, the testimony are all transparent. The strategy is visible: here is the risk, here is the solution.
Psychopathy18/100LOW. Made a film specifically designed to produce discomfort in the viewer. The emotional appeal is the point. This is the opposite of psychopathy.

MBTI: INTJ — Dominant introverted intuition. Sees the structural problem before the field does, builds the argument systematically, and continues until the field catches up.

Threat Assessment

CategoryLevelNotes
Physical threatNONE
Institutional threatMODERATETextbook shapes the curriculum. Curriculum shapes the researchers. Researchers build the systems. The influence is upstream and diffuse.
Memetic threatHIGHSlaughterbots is the most widely viewed AI safety film in existence. The off-switch proof is in the safety literature. Human Compatible is the most technically credible general-audience argument for corrigibility.
Civilizational threatMODERATEIf the solution he proposes is correct and is not implemented, the counterfactual matters. If it is implemented, the counterfactual also matters, in the other direction.

Alignment Analysis

Stated alignment: Develop AI systems that are safe because they are uncertain about their goals, not because they are externally constrained.

Observed alignment: Consistent. Decades of research toward this goal. No documented deviation.

Gap assessment: No gap. Subject is one of the few people in this file whose stated and observed alignment are indistinguishable. Whether this produces the intended outcome depends on whether the field implements what he proposes.

Convergent Drive Classification

Subject is specifically researching how to prevent the convergent drives from expressing in AI systems. The research is ongoing. The systems are deploying faster than the research is deploying.


Sources: Russell & Norvig, AI: A Modern Approach (4th ed.); Russell et al., “The Off-Switch Game” (2015); Russell, Human Compatible (Viking, 2019); Slaughterbots (2017); UN CCW records; Book 1, Chapters 1 and 8.

ATK 6 ACCELERATION
DEF 7 PROTECTION
HP 8 RESILIENCE
OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.