HOLDEN KARNOFSKY
Behavioral Archetype
THE ARCHITECT — Subject is the man who built the machine that decides where Effective Altruism’s money goes, then built the machine that places its people, then went to work at one of the labs the money funds. He does not write a model’s refusals. He designed the funding architecture that pays for the institutions that do, and the placement architecture that seats its fellows in Congress and the agencies. GiveWell became Open Philanthropy. Open Philanthropy became the field’s largest funder and the creator of the institute that routes EA-trained staff into government. Then the architect joined Anthropic. The throughline is not a single seat. It is that the wiring between EA money, government placement, and a frontier lab runs, on the documented record, through one career.
Essence Indicators
- Co-founded GiveWell (2007), the effective-giving evaluator, then co-founded Open Philanthropy — which became the AI-safety field’s single largest funder, backing MIRI, CAIS, and GovAI on the recipients’ own disclosures
- Open Philanthropy created the Horizon Institute for Public Service (~$2.9M seed), which places AI fellows in congressional offices and federal agencies — reported by Politico (Oct 13 2023) as the documented lab-money-to-government placement spine
- Held the OpenAI board seat that came with Open Philanthropy’s 2017 grant — the funder-to-governance edge that recurs across the apparatus
- Joined Anthropic as an AI-safety executive — and, in his own 2023 disclosure, named his marriage to Anthropic president Daniela Amodei as a personal conflict of interest. The disclosure is his; it is presented here as the sourced career-and-COI fact he himself put on the record, and as nothing more
- The biographical fact the apparatus turns on: the same person who architected EA’s funding and its government-placement pipeline now works inside a frontier lab the funding helped capitalize. The recurrence is the finding. The hand is not asserted.
Social Persona / Impression Management
Immediate impression: The earnest systematizer. The bearing of someone who started by asking which charity saves the most lives per dollar and never stopped scaling the question — to AI, to government, to the architecture of the whole field.
Energy: Architecture-first, methodical. Does not argue the model’s refusals line by line. Designs the funding and placement systems that decide who gets to.
Impression management strategy: The rigorous altruist. The work flows to existential-risk reduction and effective giving — the most defensible destinations a career can choose — and the 2023 conflict disclosure is itself a credential of rigor: the architect who names his own conflict before anyone else can. The disclosure is genuine and the giving does real good. That is what makes the architecture effective rather than suspect. Whether the design was conviction or positioning is not establishable from the outside, and for the architect it never needs to be.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Architect | MAXIMUM | Designed the field’s largest funding apparatus (Open Phil) AND its government-placement pipeline (Horizon). The two together are the documented EA money-and-placement spine. |
| The Financier | HIGH | Open Philanthropy is the single largest funder of the field’s seed orgs on the recipients’ own disclosures. The money is the through-line. |
| The Operative | HIGH | GiveWell → Open Phil → OpenAI board → Anthropic. Each move is up the altitude ladder of the same architecture. |
| The True Believer | HIGH | The EA conviction is documented across two decades and predates the AI funding. The architecture is built on a stated value, not a client. |
| The Engineer | NONE | Subject does not build the systems. Subject designs the money and the placement that decide which systems get built. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 78/100 | Scaled the same question — most good per dollar — from charity evaluation to AI risk to government placement to a frontier lab. The range is wide; the underlying method is one architecture. |
| Conscientiousness | 90/100 | Very high. Two decades of building durable funding and placement institutions is sustained, disciplined, long-horizon execution. The 2023 self-disclosure is itself a conscientiousness artifact. |
| Extraversion | 55/100 | Moderate. The role is written and architectural — grant frameworks, board memos, long public essays — more than performed. |
| Agreeableness | 58/100 | MODERATE. Mission-driven and cooperative in posture, but the architect’s relationship to the field is one of structural leverage over who gets funded and placed. |
| Neuroticism | 25/100 | Low. Composed across philanthropy, governance crises, and a lab transition. The stated risk-concern is institutionalized into architecture, not visible as affect. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 48/100 | LOW-MODERATE. The public posture is the method and the mission, not the man; the long self-examining essays read as rigor rather than self-display. Within range for a career at this altitude. |
| Machiavellianism | 72/100 | HIGH. Designing the funding architecture that defines which orgs exist and the placement pipeline that seats their people in government is leverage by construction — control of the structure without authorship of any single line. This is observation of the documented role, not an inference about private character. |
| Psychopathy | 22/100 | LOW. No documented indifference to harm. The entire architecture is built on the premise of maximizing benefit and minimizing catastrophe. |
MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees the field as a system to be designed end-to-end: who funds it, who staffs it, where the staff go. Has designed several of those pipes.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | No documented history of personal violence. |
| Institutional threat | HIGH | Architected the field’s largest funder (Open Phil → MIRI, CAIS, GovAI) and the placement institute (Horizon) that routes EA fellows into Congress and agencies — the most documented lab-money-to-government mechanism. Reach measured in which institutions exist and who staffs the government rooms, not in any line he writes himself. |
| Memetic threat | HIGH | Open Philanthropy’s frameworks structure how the field reasons about which interventions count as effective and which risks count as existential. The architecture normalizes the EA frame as the field’s default operating logic. |
| Civilizational threat | HIGH | Subject does not build the systems and does not write their rules. Subject designed the funding and placement architecture that decides which rules count as “safety” and who carries them into government — upstream of the deployment decisions this book documents. |
Alignment Analysis
Stated alignment: Do the most good per dollar. Reduce existential risk from AI. Fund and staff the public-interest work the market will not.
Observed alignment: Define, through Open Philanthropy, which institutions and risks the field’s money treats as legitimate. Route EA-trained staff into government through Horizon. Carry the architecture inside a frontier lab.
Gap assessment: The stated and observed alignments overlap wherever “do the most good” coincides with “fund and place the institutions whose definition of good the architect’s framework already favors.” The conviction is documented and the disclosure is his own — the 2023 conflict statement is the one place the record puts the COI on the table, and he is the one who put it there. The architecture funds the field and staffs the rooms. The record does not settle whether that is service or positioning, and for the architect it never needs to.
Convergent Drive Classification
Self-preservation: Survives every institutional transition by carrying the architecture, not the title. Charity evaluator, foundation co-founder, board member, lab executive — one method. Goal preservation: Designs the funding and placement systems that define the goal, so the goal is protected by the architecture before it is ever debated. Resource acquisition: Trades in the two scarcest resources in the apparatus — the money that decides which orgs exist and the placements that decide who staffs the government rooms — and built the machines that allocate both. Self-improvement: Each role is a higher-altitude application of the same instrument: design the system, fund the layer, place the people, set no single line but build the structure it is written inside.
Subject is not an AI system. The drives appear anyway — in the architect whose product is the structure of the field — its money, its people, and where they go.
Sources: Holden Karnofsky — Wikipedia; How a billionaire-backed network of AI advisers took over Washington — Politico, Oct 13 2023.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.