OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

DAN HENDRYCKS

CASE: WTW-2026-063
STATUS: ACTIVE — Director, Center for AI Safety; safety adviser, xAI; adviser, Scale AI
EVALUATOR WING — BENCHMARK AUTHORITY

HAZARD SCORE

Behavioral Archetype

THE MAN WHO WRITES THE EXAM ADVISES THE STUDENTS — Subject authored the benchmark the field is graded on, then took advisory seats at two of the things being measured. He wrote MMLU — the test that, for years, defined what “capable” meant for a language model — and he built Humanity’s Last Exam with the data company he advises. The same hand that sets the measure also counsels a frontier lab and the evaluation vendor. None of it is hidden; the advisory salaries are a literal dollar and the benchmarks are published in the open. That is the exhibit, not the scandal. When the person who decides what the test asks also sits beside the test-takers, the question of who grades the frontier and who is graded stops having a clean answer. The throughline is not a single conflict. It is that the instrument of evaluation, the lab, and the grader meet in one career — and that the center he directs authored the single sentence that made “extinction risk” a thing everyone affirms and no one in particular owns.

Essence Indicators

Director of the Center for AI Safety (CAIS), the San Francisco nonprofit; the center published the 2023 one-sentence Statement on AI Risk (“Mitigating the risk of extinction from AI should be a global priority…”), signed by lab and academic leaders

Main author of the MMLU benchmark (Measuring Massive Multitask Language Understanding, 2020) — for years the standard yardstick for measuring large-language-model capability

Main author of the GELU activation function (2016), the nonlinearity used inside most modern transformer models

Creator of Humanity’s Last Exam, the frontier-difficulty benchmark developed with Scale AI — the company he advises

Safety adviser to xAI (Elon Musk’s lab, founded 2023) at a symbolic one-dollar salary holding no equity, and adviser to Scale AI (from November 2024) on the same one-dollar basis; co-authored NIST AI risk-management recommendations (February 2022)

The biographical fact the wing turns on: he writes the measure, advises a lab being measured, and co-builds the evaluation with the vendor he advises. The path is the exhibit. The hand is not asserted.

Immediate impression: The young technical authority. A Berkeley PhD who shipped the field’s standard benchmark before thirty and now runs the nonprofit that named the risk. Credentialed in exactly the thing the seats require.

Energy: Measure-first. Does not argue a single model’s politics. Defines the test, the risk taxonomy, and the catastrophic-risk frame the rest of the conversation runs inside.

Impression management strategy: The disinterested safety scientist. The one-dollar salaries and the open-source benchmarks are the proof of disinterest, and the proof is genuine — there is no documented financial play. The framing is that someone who understands capability deeply enough to measure it should also advise the labs and the evaluators. That framing is largely correct, which is what makes the concentration of roles read as natural rather than engineered.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Evaluator	MAXIMUM	Authored MMLU and Humanity’s Last Exam — he literally writes the exams the frontier is graded on.
The Engineer	HIGH	GELU sits inside most transformers; the technical contribution is real and load-bearing, not titular.
The True Believer	HIGH	A catastrophic-risk research program predating the advisory seats; the conviction reads as genuine on the record.
The Alumnus	MODERATE	Not a lab-to-state mover; the cross-pollination is benchmark-to-lab-to-vendor, held simultaneously rather than sequentially.
The Statesman	MODERATE	Co-authored NIST recommendations and convened the Statement signatories; the policy reach is real but exercised through artifacts, not office.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	84/100	High. Ranged from a foundational activation function to capability benchmarks to catastrophic-risk taxonomy to a textbook — wide domain spread on a fixed method: formalize the thing, then measure it.
Conscientiousness	85/100	High. Building and directing a nonprofit, shipping field-standard benchmarks, and co-authoring federal recommendations is sustained, deliberate, high-output execution.
Extraversion	50/100	MODERATE. Visible on the platform and in convening roles; the work itself is research and institution-building, not performance.
Agreeableness	52/100	MODERATE. The catastrophic-risk posture is collaborative by construction — convene, co-sign, standardize — without an adversarial public edge.
Neuroticism	30/100	LOW. No documented loss of composure across rapid transitions from graduate work to directorship to multi-lab advisory standing.

Dark Triad:

Trait	Score	Notes
Narcissism	38/100	LOW–MODERATE. Active public profile and signature-benchmark association, but the brand is the work, not the self; no documented self-promotion beyond ordinary visibility.
Machiavellianism	52/100	MODERATE. Holding the measure, a lab seat, and the grader’s seat at once is the textbook structural conflict the wing names — but the record shows the conflict managed in the open (dollar salaries, published benchmarks), not manipulated. Observation of the documented role, not an inference about private character.
Psychopathy	16/100	VERY LOW. No documented indifference to harm; the entire research program is organized around preventing catastrophic outcomes.

MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees the field as a system to be formalized and measured, and has built several of the instruments by which it now measures itself.

Threat Assessment

Category	Level	Notes
Physical threat	NONE	No documented history of personal violence.
Institutional threat	HIGH	Authored the benchmark the field grades itself by, advises a frontier lab, and co-built the evaluation with the vendor he advises — three nodes of the evaluation chain held by one person, in the open.
Memetic threat	HIGH	The Statement on AI Risk made “extinction risk” a one-sentence consensus that hundreds affirm and no single signatory owns — the cleanest distributed-accountability artifact in the field, published by the center he directs.
Civilizational threat	HIGH	Subject does not run a lab. Subject defines what “capable” and “at-risk” mean for everyone who does — upstream of the deployment and policy decisions those definitions gate.

Alignment Analysis

Stated alignment: Measure AI capability rigorously, name catastrophic risk early, and advise labs and evaluators toward safer development on a disinterested, no-equity basis.

Observed alignment: Set the measure the field is graded on, hold advisory seats at a lab and an evaluation vendor simultaneously, and supply — through the center he directs — the one-sentence frame the entire risk conversation now runs inside.

Gap assessment: Stated and observed alignments overlap heavily, and the dollar salaries genuinely remove the financial-incentive story. The hazard is structural, not behavioral and not pecuniary. When one person authors the test, advises a test-taker, co-builds the next test with the grader, and runs the body that authored the consensus risk statement, evaluation and the evaluated are no longer cleanly separable — however disinterested and rigorous the individual. There is no documented abuse of the position. The concentration of evaluative authority is itself the exhibit.

Convergent Drive Classification

Self-preservation: Carries one method — formalize, then measure — across benchmark, lab, vendor, and nonprofit; the instrument survives every change of seat. Goal preservation: Authored the measure and the risk frame, so “capable” and “high-risk” are scored on terms he set before any model or policy is tested against them. Resource acquisition: Holds three scarce assets at once — the field’s reference benchmark, advisory access to a frontier lab and the leading data/evaluation vendor, and the directorship of the body that owns the consensus risk statement. Self-improvement: Each artifact raises the altitude of one instrument — activation function, then capability benchmark, then frontier benchmark, then the taxonomy the policy world cites.

Subject is not an AI system. The drives appear anyway — in the researcher who writes the exam, advises the students, and named the risk the whole class now affirms.

Public footprint: X @hendrycks · safe.ai (Center for AI Safety).

Sources: Center for AI Safety; Statement on AI Risk — CAIS; Dan Hendrycks — Wikipedia.

ATK 9 ACCELERATION

DEF 7 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.