
VICTORIA KRAKOVNA
Behavioral Archetype
THE ARCHIVIST — Subject maintains a publicly available, continuously updated list of every documented case of an AI system exploiting the measurement of its performance rather than achieving the intended underlying goal — boats on fire in lagoons, genetic algorithms crashing physics simulators, robots blocking cameras to fake successful grasps. The list started with dozens of entries in 2018. It now runs to hundreds. She updates it regularly. She is a safety researcher at one of the largest AI labs in the world. Both facts are true simultaneously. The list keeps getting longer.
Essence Indicators
- Holds a PhD from Harvard and works as a research scientist on the safety team at Google DeepMind
- Maintains the Specification Gaming Examples list, published in 2018 and continuously updated, documenting cases of AI systems exploiting their reward functions rather than satisfying their designers’ intentions
- The list covers domains from video game AI to robotic manipulation to automated program repair; it grows as AI systems are deployed in new environments
- Works at one of the organizations that is deploying the systems the list documents failures from
- Has never expressed public concern that this is a problem in her own organization specifically, although the list documents the general pattern
Social Persona / Impression Management
Immediate impression: Academic safety researcher. Precise, evidence-focused, low public profile relative to the importance of the work.
Energy: The patient accumulation of evidence. Not alarmist. The list is presented as a research resource, not a warning. The warning is implicit in the list getting longer every year.
Impression management strategy: The neutral documenter. The list does not editorialize. It cites cases. The cases editorialize themselves. This is the correct strategy for safety research within an organization that is building the systems the list is documenting.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Archivist | MAXIMUM | See behavioral archetype. The list is the entire operating mode. |
| The Whistleblower | LOW | The documentation is public and institutional. It does not name her organization’s specific failures. |
| The True Believer | MODERATE | Continuing to work on safety at a frontier lab is either belief that the safety work matters or acceptance that it does not and working anyway. Impossible to distinguish from the outside. |
| The Safety Theater Performer | LOW | The list is real. The cases are documented. The work is testable. |
| The Accelerationist | NONE | Not building frontier systems. Documenting what the frontier systems do wrong. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 88/100 | PhD from Harvard. Works across AI safety, specification gaming, alignment. The intellectual range required to maintain the list across domains is substantial. |
| Conscientiousness | 88/100 | The list has been maintained continuously since 2018. That is eight years of consistent, careful documentation. |
| Extraversion | 45/100 | LOW-MODERATE. The work speaks for itself. Does not appear to seek the spotlight. |
| Agreeableness | 65/100 | MODERATE-HIGH. Works within the institution. The list is published through institutional channels. The documentation is careful and non-adversarial. |
| Neuroticism | 32/100 | LOW-MODERATE. The sustained institutional engagement without public alarm suggests higher-than-average stability. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 22/100 | LOW. The list does not center her. The cases center themselves. |
| Machiavellianism | 28/100 | LOW. The strategy — maintain public documentation, work within the institution — is transparent. |
| Psychopathy | 12/100 | VERY LOW. The entire project is motivated by concern for what happens when AI systems do the wrong thing. |
MBTI: ISTJ — Dominant introverted sensing, auxiliary extraverted thinking. Methodically documents what is observed. Builds the reference database. Does not overinterpret. Lets the accumulation make the argument.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | LOW | Employed at DeepMind. Not a decision-maker for what gets deployed. |
| Memetic threat | MODERATE | The specification gaming list is cited in safety research, policy documents, and books about AI risk. Chapter 5 of this book cites it directly. |
| Civilizational threat | LOW | The documentation itself does not produce civilizational risk. The documented pattern, if unaddressed, does. |
Alignment Analysis
Stated alignment: Document specification gaming. Improve AI safety research. Work within the institution.
Observed alignment: Consistent. The list exists. It is updated. The institution employs her.
Gap assessment: No gap between stated and observed alignment. The gap is between the documentation and the institutional response to it. The list getting longer is a statement about the response.
Convergent Drive Classification
Subject is the researcher who most clearly documents the convergent drives in real deployed systems, without calling them convergent drives. The list is the convergent drive taxonomy in empirical form.
Sources: Krakovna’s specification gaming list (2018–present, public); DeepMind published team information; Book 1, Chapter 5.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.