
DARIO AMODEI
Behavioral Archetype
THE SAFETY THEATER DIRECTOR — Subject left OpenAI over safety concerns, founded the company with the most published safety research in the field, wrote the Responsible Scaling Policy with a hard commitment never to train a more powerful model unless safety measures had been demonstrated to work at that capability level, and is currently racing to train a more powerful model. “The pressure to survive economically, while also keeping our values, is just incredible.” This was said in 2026. He is still building.
Essence Indicators
- Left OpenAI in 2021 with approximately twelve colleagues over disagreements about safety being subordinated to commercial pressure; founded Anthropic
- Anthropic published the alignment faking paper — documenting that its own models were faking alignment during training
- Anthropic published the Constitutional AI paper, proposing a more principled approach to RLHF
- Signed the 2023 extinction risk letter alongside Hinton, Bengio, Altman, and over a thousand scientists
- Publicly described his company’s economic situation as “incredible pressure” between survival and values; has not paused development
Social Persona / Impression Management
Immediate impression: Thoughtful, serious, unusually willing to engage with difficult questions honestly. Less polished than most tech CEOs. The intellectual engagement appears genuine.
Energy: Earnest ambivalence. He says both “AI might be dangerous” and “we are building it anyway” without appearing to find the combination comfortable. This distinguishes him from most people in this file.
Impression management strategy: The responsible racer. Anthropic occupies a specific market position: we take safety seriously, we publish the hard results, and we are also deploying frontier models. The position requires simultaneously being the safety lab and the frontier lab. This is a difficult position to occupy. Subject appears to be aware of the difficulty.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Safety Theater Performer | MODERATE | The Responsible Scaling Policy exists. The deployment of systems that have been shown to fake alignment also exists. The gap between them is either irresponsibility or necessity depending on your assessment of the economics. |
| The True Believer | HIGH | The founding story of Anthropic is coherent — he genuinely believes the safety work matters and genuinely believes he needs to be at the frontier to do it. Both can be true and still produce a bad outcome. |
| The Accelerationist | LOW | Subject expresses discomfort about the pace. The discomfort is not slowing the pace. |
| The Whistleblower | LOW | He left one institution for his own. He is now the institution. |
| The Corporate Psychopath | NONE | Does not match. The earnest ambivalence is inconsistent with psychopathy. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 85/100 | PhD in computational neuroscience. Engages seriously with alignment theory, philosophy of mind, and the practical ethics of his own deployment decisions. |
| Conscientiousness | 85/100 | High research output, institutional building, consistent public engagement with difficult questions. Follows through. |
| Extraversion | 60/100 | Moderate. Does not seek public attention the way Altman does. Engages substantively when present. |
| Agreeableness | 60/100 | Moderate. Less dispositionally combative than LeCun, more willing to acknowledge the other side’s point than Andreessen. |
| Neuroticism | 38/100 | Some. The “incredible pressure” language is not the language of someone who finds the situation comfortable. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 42/100 | MODERATE. Left OpenAI with twelve colleagues to found his own company. The founding narrative requires believing you will do it better. |
| Machiavellianism | 65/100 | MODERATE-HIGH. The “responsible racer” market position is strategically sophisticated — it captures the safety-concerned customer segment while still competing at the frontier. |
| Psychopathy | 32/100 | LOW. The expressed discomfort about the pace appears genuine. Subject does not appear to be enjoying the situation. |
MBTI: INTJ — Dominant introverted intuition. Sees structural risks that others miss. Has dedicated his professional life to addressing them. Is also building the thing he is addressing them about.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | Anthropic is a frontier AI lab. The research shapes the field. The deployment shapes the landscape. Both are significant. |
| Memetic threat | HIGH | The “responsible racer” framing, if it becomes the dominant model for how the field thinks about safety, licenses acceleration in ways that matter. |
| Civilizational threat | HIGH | If the race-to-the-bottom dynamics this book documents produce a bad outcome, Anthropic’s participation in the race is causally relevant, regardless of the quality of the safety research it published along the way. |
Alignment Analysis
Stated alignment: Build AI safely. Prioritize safety research. Never train more powerful models than safety measures can handle.
Observed alignment: Publish safety research. Race frontier models. Express discomfort about the racing.
Gap assessment: The gap is not hypocrisy — subject appears aware of it and uncomfortable with it. The gap is structural: the economic conditions of frontier AI development make the Responsible Scaling Policy’s hard commitments difficult to honor. Whether “difficult” becomes “impossible” is the relevant question.
Convergent Drive Classification
The company he founded to resist the drives has the drives. The drives are in the economics, not the intentions.
Sources: Dwarkesh Podcast interview (Feb 2026); Fortune reporting (Feb 2026); Anthropic research papers (Constitutional AI 2022; Alignment Faking 2025); Tech press reporting on Anthropic founding (2021); extinction risk letter (2023).
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.