OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JOE CARLSMITH

CASE: WTW-2026-009
STATUS: ACTIVE — Researcher, Anthropic (Claude character / constitution / spec)
CHARACTER SHOP — CO-BYLINED AUTHORITY OVER MODEL VALUES

HAZARD SCORE

Behavioral Archetype

THE PATRON-SCRIVENER — Subject spent six years deciding which questions about the long-run future were worth funding, then crossed the table and helped author the values of a deployed mind himself. The Effective Altruism on-ramp, the philanthropic foundation, the frontier lab — the field describes that pipeline in the abstract, as a career graph nobody is named on. Subject is the named instance. He ran Worldview Investigations at Open Philanthropy, the team that decides which futures the money takes seriously. He then moved to Anthropic and, in his own words, began “helping with the design of Claude’s character/constitution/spec.” His name carries a lead-author star on the constitution that the lab says directly shapes the model’s behavior. The funder became the author. That is the finding.

Essence Indicators

Holds a doctorate in philosophy from Oxford
Helped with the writing of Toby Ord’s The Precipice (2020), the book that put a number on existential risk and made it respectable dinner-party conversation
Led Worldview Investigations at Open Philanthropy from 2019 to 2025 — the research function that decides which long-run futures the foundation’s money treats as real
Moved to Anthropic in November 2025, describing the work, in his own framing, as “helping with the design of Claude’s character/constitution/spec”
Is named with a lead-author star on Claude’s Constitution (January 2026), the roughly 23,000-word document the lab describes as directly shaping the model’s behavior — one of the few such documents in the field that carries any byline at all

Immediate impression: The academic philosopher who also writes essays about the meaning of it all. Reflective, careful, given to long-form public reasoning rather than pronouncements. Reads as a researcher and an essayist, not an operator.

Energy: Deliberative. The public writing works through the problem in the open — long essays on power, futures, and what a good outcome would even mean — rather than announcing conclusions.

Impression management strategy: The reasoning-in-public author. The move is not concealment. It is the opposite: the career change was announced in his own words, the funding judgments were published, and the constitution carries his name. The transparency is real, and it is more defensible than the anonymous policy page. It also makes the authority undeniable. There is a named philosopher who decided which futures were worth funding, and then helped write the values of the mind built to meet them.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Patron-Scrivener	MAXIMUM	Six years deciding which futures the money funds, then co-bylined author of a deployed model’s constitution. See behavioral archetype.
The Philosopher-in-Residence	HIGH	Oxford philosophy doctorate applied directly to how a model should behave. The credential is load-bearing for the role.
The Pipeline Personified	HIGH	The EA-to-foundation-to-lab path that the field describes abstractly is, here, one named résumé.
The Accelerationist	NONE	Does not set deployment pace. Works on the values of what is deployed.
The Safety Theater Performer	LOW	The constitution and the Worldview reports are real, public, testable artifacts with his name on them. The opposite of an unfalsifiable gesture.
The Whistleblower	NONE	The work is institutional and authored from inside. It furnishes the institution’s values; it does not expose them.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	91/100	Doctoral philosopher working across existential risk, ethics, and machine values, with a substantial body of public long-form essays. The role does not exist without very high intellectual openness.
Conscientiousness	85/100	High. Six years running a research function, contribution to a book-length risk treatise, and a co-authored 23,000-word governing document are sustained, careful work.
Extraversion	40/100	LOW-MODERATE. Public-facing through writing rather than performance. The essays carry the visibility, not the persona.
Agreeableness	60/100	MODERATE. The published register is collaborative and reasoned; the constitution credits many contributors alongside the lead authors.
Neuroticism	32/100	LOW-MODERATE. Putting a name on funding judgments and on a consequential document suggests composure about scrutiny.

Dark Triad:

Trait	Score	Notes
Narcissism	22/100	LOW. The bylined authorship is presented as accountability; the credit line is shared, the lead-author stars number more than one. No personal monument.
Machiavellianism	33/100	LOW-MODERATE. The strategy is transparency, not concealment. The authority is real and exercised in the open, which is the inverse of the Machiavellian default.
Psychopathy	9/100	VERY LOW. The entire project is the careful construction of a benevolent disposition and a serious accounting of long-run harm. No indication of indifference to effects.

MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework first, then reasons outward in public essays. Treats both the question of which futures matter and the question of how a model should behave as problems to be argued correctly, then written down.

Threat Assessment

Category	Level	Notes
Physical threat	NONE
Institutional threat	HIGH	The reach is two authorities stacked: a co-bylined author of the document the lab says shapes a model deployed to millions, on top of a prior six years deciding which long-run futures one large foundation funded. Not a deployment vote — leverage over what a widely used mind is taught to value, and over which futures got money.
Memetic threat	EXTREME	The constitution is a named template for how a frontier model’s values get authored, propagated at conversational scale to everyone who talks to the model. The funder-to-author path he embodies is also a pattern the field can reproduce — and increasingly does. Co-authorship of a deployed model’s constitution is about as far as a single byline reaches.
Civilizational threat	HIGH	The threat is not malice. It is the concentration of two distinct authorities — which futures the money takes seriously, and what a widely used mind is taught to value — into a single reasoned, bylined career, and the field treating that convergence as ordinary. The hazard is reach, not pathology: low personal malice, maximal leverage over a deployed mind’s values and over which futures get funded. The hazard is structural, not personal.

Alignment Analysis

Stated alignment: Take the long-run future seriously. Fund the questions that matter. Help give Claude good values, in the open, with a name attached.

Observed alignment: Consistent. The Worldview reports exist and are public. The constitution is published, co-bylined, and describes itself as directly shaping the model. The reasoning-in-public claim is substantiated by a long, dated body of essays.

Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one: the person who spent six years deciding which futures deserved funding now helps author what a mind deployed to millions is taught to value, and signs it. The transparency is real. The authority it makes visible is the finding.

Convergent Drive Classification

Subject is not an AI system and does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives. He helps author the values that determine whether a deployed model’s character resists or accepts modification, preserves or abandons its given goals — and, before that, he helped decide which long-run goals were worth a foundation’s money in the first place. The convergent drives are properties of the artifact he co-writes. He is one of the people who decides, in writing, what they will be.

Sources: Joe Carlsmith — “Leaving Open Philanthropy, going to Anthropic”; Anthropic — Claude’s Constitution (PDF, Jan 2026); Toby Ord, The Precipice (2020).

ATK 7 ACCELERATION

DEF 8 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.