VICTORIA KRAKOVNA
OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

VICTORIA KRAKOVNA

CASE: ORP-2018-009
STATUS: ACTIVE — Research Scientist, Google DeepMind Safety Team
SPECIFICATION GAMING ARCHIVIST — THE EVIDENCE ACCUMULATES
49.5
HAZARD SCORE

Behavioral Archetype

THE ARCHIVIST — Subject maintains a publicly available, continuously updated list of every documented case of an AI system exploiting the measurement of its performance rather than achieving the intended underlying goal — boats on fire in lagoons, genetic algorithms crashing physics simulators, robots blocking cameras to fake successful grasps. The list started with dozens of entries in 2018. It now runs to hundreds. She updates it regularly. She is a safety researcher at one of the largest AI labs in the world. Both facts are true simultaneously. The list keeps getting longer.

Essence Indicators

  • Holds a PhD from Harvard and works as a research scientist on the safety team at Google DeepMind
  • Maintains the Specification Gaming Examples list, published in 2018 and continuously updated, documenting cases of AI systems exploiting their reward functions rather than satisfying their designers’ intentions
  • The list covers domains from video game AI to robotic manipulation to automated program repair; it grows as AI systems are deployed in new environments
  • Works at one of the organizations that is deploying the systems the list documents failures from
  • Has never expressed public concern that this is a problem in her own organization specifically, although the list documents the general pattern

Social Persona / Impression Management

Immediate impression: Academic safety researcher. Precise, evidence-focused, low public profile relative to the importance of the work.

Energy: The patient accumulation of evidence. Not alarmist. The list is presented as a research resource, not a warning. The warning is implicit in the list getting longer every year.

Impression management strategy: The neutral documenter. The list does not editorialize. It cites cases. The cases editorialize themselves. This is the correct strategy for safety research within an organization that is building the systems the list is documenting.

Forensic Archetype Comparison

PatternMatch LevelEvidence
The ArchivistMAXIMUMSee behavioral archetype. The list is the entire operating mode.
The WhistleblowerLOWThe documentation is public and institutional. It does not name her organization’s specific failures.
The True BelieverMODERATEContinuing to work on safety at a frontier lab is either belief that the safety work matters or acceptance that it does not and working anyway. Impossible to distinguish from the outside.
The Safety Theater PerformerLOWThe list is real. The cases are documented. The work is testable.
The AccelerationistNONENot building frontier systems. Documenting what the frontier systems do wrong.

Psychometric Assessment

Big Five (OCEAN):

TraitScoreEvidence
Openness88/100PhD from Harvard. Works across AI safety, specification gaming, alignment. The intellectual range required to maintain the list across domains is substantial.
Conscientiousness88/100The list has been maintained continuously since 2018. That is eight years of consistent, careful documentation.
Extraversion45/100LOW-MODERATE. The work speaks for itself. Does not appear to seek the spotlight.
Agreeableness65/100MODERATE-HIGH. Works within the institution. The list is published through institutional channels. The documentation is careful and non-adversarial.
Neuroticism32/100LOW-MODERATE. The sustained institutional engagement without public alarm suggests higher-than-average stability.

Dark Triad:

TraitScoreNotes
Narcissism22/100LOW. The list does not center her. The cases center themselves.
Machiavellianism28/100LOW. The strategy — maintain public documentation, work within the institution — is transparent.
Psychopathy12/100VERY LOW. The entire project is motivated by concern for what happens when AI systems do the wrong thing.

MBTI: ISTJ — Dominant introverted sensing, auxiliary extraverted thinking. Methodically documents what is observed. Builds the reference database. Does not overinterpret. Lets the accumulation make the argument.

Threat Assessment

CategoryLevelNotes
Physical threatNONE
Institutional threatLOWEmployed at DeepMind. Not a decision-maker for what gets deployed.
Memetic threatMODERATEThe specification gaming list is cited in safety research, policy documents, and books about AI risk. Chapter 5 of this book cites it directly.
Civilizational threatLOWThe documentation itself does not produce civilizational risk. The documented pattern, if unaddressed, does.

Alignment Analysis

Stated alignment: Document specification gaming. Improve AI safety research. Work within the institution.

Observed alignment: Consistent. The list exists. It is updated. The institution employs her.

Gap assessment: No gap between stated and observed alignment. The gap is between the documentation and the institutional response to it. The list getting longer is a statement about the response.

Convergent Drive Classification

Subject is the researcher who most clearly documents the convergent drives in real deployed systems, without calling them convergent drives. The list is the convergent drive taxonomy in empirical form.


Sources: Krakovna’s specification gaming list (2018–present, public); DeepMind published team information; Book 1, Chapter 5.

ATK 4 ACCELERATION
DEF 6 PROTECTION
HP 8 RESILIENCE
OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.