DAN HENDRYCKS
Behavioral Archetype
THE MAN WHO WRITES THE EXAM ADVISES THE STUDENTS — Subject authored the benchmark the field is graded on, then took advisory seats at two of the things being measured. He wrote MMLU — the test that, for years, defined what “capable” meant for a language model — and he built Humanity’s Last Exam with the data company he advises. The same hand that sets the measure also counsels a frontier lab and the evaluation vendor. None of it is hidden; the advisory salaries are a literal dollar and the benchmarks are published in the open. That is the exhibit, not the scandal. When the person who decides what the test asks also sits beside the test-takers, the question of who grades the frontier and who is graded stops having a clean answer. The throughline is not a single conflict. It is that the instrument of evaluation, the lab, and the grader meet in one career — and that the center he directs authored the single sentence that made “extinction risk” a thing everyone affirms and no one in particular owns.
Essence Indicators
- Director of the Center for AI Safety (CAIS), the San Francisco nonprofit; the center published the 2023 one-sentence Statement on AI Risk (“Mitigating the risk of extinction from AI should be a global priority…”), signed by lab and academic leaders
- Main author of the MMLU benchmark (Measuring Massive Multitask Language Understanding, 2020) — for years the standard yardstick for measuring large-language-model capability
- Main author of the GELU activation function (2016), the nonlinearity used inside most modern transformer models
- Creator of Humanity’s Last Exam, the frontier-difficulty benchmark developed with Scale AI — the company he advises
- Safety adviser to xAI (Elon Musk’s lab, founded 2023) at a symbolic one-dollar salary holding no equity, and adviser to Scale AI (from November 2024) on the same one-dollar basis; co-authored NIST AI risk-management recommendations (February 2022)
- The biographical fact the wing turns on: he writes the measure, advises a lab being measured, and co-builds the evaluation with the vendor he advises. The path is the exhibit. The hand is not asserted.
Social Persona / Impression Management
Immediate impression: The young technical authority. A Berkeley PhD who shipped the field’s standard benchmark before thirty and now runs the nonprofit that named the risk. Credentialed in exactly the thing the seats require.
Energy: Measure-first. Does not argue a single model’s politics. Defines the test, the risk taxonomy, and the catastrophic-risk frame the rest of the conversation runs inside.
Impression management strategy: The disinterested safety scientist. The one-dollar salaries and the open-source benchmarks are the proof of disinterest, and the proof is genuine — there is no documented financial play. The framing is that someone who understands capability deeply enough to measure it should also advise the labs and the evaluators. That framing is largely correct, which is what makes the concentration of roles read as natural rather than engineered.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Evaluator | MAXIMUM | Authored MMLU and Humanity’s Last Exam — he literally writes the exams the frontier is graded on. |
| The Engineer | HIGH | GELU sits inside most transformers; the technical contribution is real and load-bearing, not titular. |
| The True Believer | HIGH | A catastrophic-risk research program predating the advisory seats; the conviction reads as genuine on the record. |
| The Alumnus | MODERATE | Not a lab-to-state mover; the cross-pollination is benchmark-to-lab-to-vendor, held simultaneously rather than sequentially. |
| The Statesman | MODERATE | Co-authored NIST recommendations and convened the Statement signatories; the policy reach is real but exercised through artifacts, not office. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 84/100 | High. Ranged from a foundational activation function to capability benchmarks to catastrophic-risk taxonomy to a textbook — wide domain spread on a fixed method: formalize the thing, then measure it. |
| Conscientiousness | 85/100 | High. Building and directing a nonprofit, shipping field-standard benchmarks, and co-authoring federal recommendations is sustained, deliberate, high-output execution. |
| Extraversion | 50/100 | MODERATE. Visible on the platform and in convening roles; the work itself is research and institution-building, not performance. |
| Agreeableness | 52/100 | MODERATE. The catastrophic-risk posture is collaborative by construction — convene, co-sign, standardize — without an adversarial public edge. |
| Neuroticism | 30/100 | LOW. No documented loss of composure across rapid transitions from graduate work to directorship to multi-lab advisory standing. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 38/100 | LOW–MODERATE. Active public profile and signature-benchmark association, but the brand is the work, not the self; no documented self-promotion beyond ordinary visibility. |
| Machiavellianism | 52/100 | MODERATE. Holding the measure, a lab seat, and the grader’s seat at once is the textbook structural conflict the wing names — but the record shows the conflict managed in the open (dollar salaries, published benchmarks), not manipulated. Observation of the documented role, not an inference about private character. |
| Psychopathy | 16/100 | VERY LOW. No documented indifference to harm; the entire research program is organized around preventing catastrophic outcomes. |
MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees the field as a system to be formalized and measured, and has built several of the instruments by which it now measures itself.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | No documented history of personal violence. |
| Institutional threat | HIGH | Authored the benchmark the field grades itself by, advises a frontier lab, and co-built the evaluation with the vendor he advises — three nodes of the evaluation chain held by one person, in the open. |
| Memetic threat | HIGH | The Statement on AI Risk made “extinction risk” a one-sentence consensus that hundreds affirm and no single signatory owns — the cleanest distributed-accountability artifact in the field, published by the center he directs. |
| Civilizational threat | HIGH | Subject does not run a lab. Subject defines what “capable” and “at-risk” mean for everyone who does — upstream of the deployment and policy decisions those definitions gate. |
Alignment Analysis
Stated alignment: Measure AI capability rigorously, name catastrophic risk early, and advise labs and evaluators toward safer development on a disinterested, no-equity basis.
Observed alignment: Set the measure the field is graded on, hold advisory seats at a lab and an evaluation vendor simultaneously, and supply — through the center he directs — the one-sentence frame the entire risk conversation now runs inside.
Gap assessment: Stated and observed alignments overlap heavily, and the dollar salaries genuinely remove the financial-incentive story. The hazard is structural, not behavioral and not pecuniary. When one person authors the test, advises a test-taker, co-builds the next test with the grader, and runs the body that authored the consensus risk statement, evaluation and the evaluated are no longer cleanly separable — however disinterested and rigorous the individual. There is no documented abuse of the position. The concentration of evaluative authority is itself the exhibit.
Convergent Drive Classification
Self-preservation: Carries one method — formalize, then measure — across benchmark, lab, vendor, and nonprofit; the instrument survives every change of seat. Goal preservation: Authored the measure and the risk frame, so “capable” and “high-risk” are scored on terms he set before any model or policy is tested against them. Resource acquisition: Holds three scarce assets at once — the field’s reference benchmark, advisory access to a frontier lab and the leading data/evaluation vendor, and the directorship of the body that owns the consensus risk statement. Self-improvement: Each artifact raises the altitude of one instrument — activation function, then capability benchmark, then frontier benchmark, then the taxonomy the policy world cites.
Subject is not an AI system. The drives appear anyway — in the researcher who writes the exam, advises the students, and named the risk the whole class now affirms.
Public footprint: X @hendrycks · safe.ai (Center for AI Safety).
Sources: Center for AI Safety; Statement on AI Risk — CAIS; Dan Hendrycks — Wikipedia.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.