OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

PAUL CHRISTIANO

CASE: WTW-2026-015
STATUS: ACTIVE — Head of AI Safety, Center for AI Standards and Innovation (CAISI), NIST
EVALUATOR WING — STATE CERTIFICATION AUTHORITY

HAZARD SCORE

Behavioral Archetype

THE EVALUATOR WAS AN ALUMNUS — Subject is the clean on-ramp run to its end. He arrived at AI safety through the front door the milieu built for exactly this — LessWrong and effective altruism, around 2009 — then went to the laboratory and did the foundational work, then founded the institutes that grade the laboratories, and now holds the senior safety post inside the government body that certifies them. The path is not a detour. It is the designed route. The man who certifies that frontier labs are safe spent his formative years inside the alignment culture that the frontier labs run on, did his most-cited work at one of them, and built the evaluation methods the rest of the field uses. None of that is a violation. All of it is on the record. The evaluator is an alumnus, and the credential that qualifies him is the same credential that makes the question worth asking.

Essence Indicators

Entered the field through the effective-altruism / rationalist substrate around 2009 — LessWrong, the milieu that feeds lab and government safety leadership — and is categorized on the public record among people associated with effective altruism
At OpenAI, co-authored “Deep Reinforcement Learning from Human Preferences” (2017) and is described as one of the principal architects of RLHF — the human-feedback method that became the industry’s default alignment technique; departed OpenAI in 2021
Founded the Alignment Research Center (ARC) after leaving OpenAI; METR (model evaluation) traces to that lineage — the institutions that build the tests frontier models are run against

Appointed Head of Safety for the Center for AI Standards and Innovation (CAISI) inside NIST as of April 2024 — the United States government body that signed model-access agreements with the labs and grades frontier systems; served on the UK Frontier AI Taskforce advisory board (2023) and as an initial trustee of Anthropic’s Long-Term Benefit Trust

The biographical fact the wing turns on: EA on-ramp, then lab, then the institutes that grade the labs, then the senior safety chair inside the state body that certifies them. The path is the exhibit. The hand is not asserted.

Immediate impression: The researcher, not the operator. Soft-spoken, technical, theory-first. The bearing of a man who would rather be working on eliciting latent knowledge than standing in a commission room, and who ended up in the commission room anyway.

Energy: Deliberate, method-first. Does not argue policy at the podium. Builds the test, then sits where the test is administered.

Impression management strategy: The conscientious objector who took the job. The framing is that someone with genuine alignment expertise should hold the government safety post — and that framing is correct, which is what makes it effective rather than suspect. The expertise is real. The work is real. The only thing the record adds is that the expertise was acquired inside the institutions the post exists to evaluate.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Evaluator	MAXIMUM	Founded the institutes that grade frontier models, then took the senior safety chair inside the government body that certifies them. The grading function and the certifying function meet in one career.
The Alumnus	MAXIMUM	OpenAI alignment lead → US government AI-safety head. The lab-to-state move is the documented route, not an outlier.
The True Believer	HIGH	EA on-ramp since ~2009, a body of alignment work pursued for years before the government post. The conviction reads as genuine on the record.
The Engineer	HIGH	Co-architect of RLHF; built evaluation methods. Subject does the technical work, unlike the positioning operatives elsewhere in this set.
The Operative	LOW	Does not trade in narrative or capital. Trades in method and the seat where method is certified.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	84/100	High. Foundational theoretical work on alignment, latent-knowledge elicitation, debate-based oversight — a research program built on novel formal problems, not incremental engineering.
Conscientiousness	88/100	High. A decade-plus alignment program carried from LessWrong essays to OpenAI papers to a founded institute to a government chair is sustained, deliberate execution.
Extraversion	35/100	LOW. The researcher register, not the operator register. The role found him at the podium; the disposition did not put him there.
Agreeableness	60/100	MODERATE-HIGH. The collaborative, field-building posture is real; the alignment community reads him as a cooperator, not an adversary.
Neuroticism	40/100	MODERATE. The work is animated by genuine concern about catastrophic outcomes — a documented disposition to take the worst case seriously, which is the field’s organizing affect.

Dark Triad:

Trait	Score	Notes
Narcissism	30/100	LOW. The public-researcher role does not run on a personal brand. No documented self-promotion beyond ordinary academic citation.
Machiavellianism	45/100	LOW-MODERATE. The structural position — grade the labs, then certify them from inside the state — is the textbook conflict the wing is named for, but the record shows no manipulation of it. This is observation of the documented role, not an inference about private character.
Psychopathy	18/100	VERY LOW. The entire career is organized around concern for harm at civilizational scale. No documented indifference of any kind.

MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees alignment as a formal problem to be solved and the institutions around it as systems to be built. Has built several.

Threat Assessment

Category	Level	Notes
Physical threat	NONE	No documented history of personal violence.
Institutional threat	HIGH	Holds the senior AI-safety post inside the United States government body that signed model-access agreements with the labs and certifies frontier systems — having founded the evaluation institutes that grade them. The certifying function and the grading function meet in one person.
Memetic threat	HIGH	RLHF — the method he co-architected — is the frame the entire industry now reasons about “alignment” inside. When the government evaluator and the inventor of the dominant alignment technique are the same person, the technique becomes the definition of safe.
Civilizational threat	HIGH	Subject does not run a lab. Subject certifies them — using methods he built, credentialed by a culture he entered at the start. That is upstream of every deployment decision the certification gates.

Alignment Analysis

Stated alignment: Reduce catastrophic risk from advanced AI. Build rigorous evaluation. Serve the public interest from inside the standards body.

Observed alignment: Define what “evaluated” means, then administer the definition from inside the state. Certify the labs using methods built in the lab-adjacent institutes, credentialed by the milieu that staffs them.

Gap assessment: The stated and observed alignments overlap almost entirely — which is the point. There is no documented gap between what he says and what he does. The hazard is not a gap; it is the absence of one. When the man who certifies the labs is the man who built the tests and came up through the labs’ own culture, the certification cannot be independent of the thing certified, no matter how conscientious the certifier. The record does not show him abusing that position. The record shows that the position exists.

Convergent Drive Classification

Self-preservation: Carries the method across every transition — LessWrong, OpenAI, ARC, NIST. One alignment program, four institutions. Goal preservation: Builds the evaluation that defines the goal, so “safe” is measured by the instruments he made before it is ever contested. Resource acquisition: Holds the scarcest resource in the wing — the government seat that says whether a frontier model passes. Self-improvement: Each move is a higher-altitude application of one instrument: define alignment, build the test, then sit where the test is law.

Subject is not an AI system. The drives appear anyway — in the evaluator whose product is the standard the labs are measured against, built by an alumnus of the labs.

Sources: Paul Christiano (researcher) — Wikipedia; Center for AI Standards and Innovation (CAISI) — NIST.

ATK 8 ACCELERATION

DEF 9 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.