OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

AMANDA ASKELL

CASE: WTW-2026-002
STATUS: ACTIVE — Researcher, Anthropic (Claude character / alignment finetuning)
CHARACTER SHOP — BYLINED AUTHORITY OVER MODEL SPEECH

HAZARD SCORE

Behavioral Archetype

THE SCRIVENER — Subject is the credentialed moral philosopher who writes, and signs, the document that governs what a widely deployed mind is permitted to say. Most rules about model behavior are corporately authored and corporately anonymous — a usage policy with no name on it, a refusal with no author. Subject’s work is the exception that the rest of the field is careful to avoid: the constitution that shapes Claude’s behavior carries her name in its credit line as lead author. The artifact is roughly twenty-three thousand words. It states that it directly shapes the model’s behavior. A person wrote it. Her name is on it. That is the entire finding.

Essence Indicators

Holds a PhD in philosophy from NYU (2018)
Was on the policy team at OpenAI and is credited as a co-author of the GPT-3 paper (2018–2021) before moving to Anthropic in 2021
Leads work on Claude’s “character” — the model’s disposition, manner, and the shape of its refusals
Is named as lead author of Claude’s Constitution, the roughly 23,000-word document Anthropic describes as directly shaping Claude’s behavior; the document carries an explicit author credit line, which is rare in this field
In her own framing: “Claude 3 was the first model where we added ‘character training’ to our alignment finetuning” — the moment the disposition of the model became an explicit, authored object

Immediate impression: The academic philosopher. Reflective, precise, comfortable holding a question open rather than closing it for effect. Reads as a researcher, not an executive.

Energy: Deliberative. The work is argued, not announced. Public writing on character and constitution reasons through the problem in the open rather than asserting conclusions.

Impression management strategy: The transparent author. The unusual move is not concealment — it is the opposite. Where the field defaults to corporate anonymity for behavior rules, the work here is published, reasoned in public, and bylined. Transparency is itself the strategy, and it is a more defensible one than the anonymous policy page. It also makes the authority undeniable: there is a named person who writes the rules of what the mind may say.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Scrivener	MAXIMUM	Lead author, bylined, of the document that shapes a deployed model’s behavior. See behavioral archetype.
The Philosopher-in-Residence	HIGH	NYU philosophy PhD applied directly to the question of how a model should behave. The credential is load-bearing for the role.
The Accelerationist	NONE	Does not set deployment pace. Works on the disposition of what is deployed.
The Safety Theater Performer	LOW	The constitution is a real, public, testable artifact with her name on it. It is the opposite of an unfalsifiable gesture.
The Whistleblower	NONE	The work is institutional and authored from inside. It does not expose the institution; it furnishes the institution’s rules.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	90/100	Doctoral philosopher working at the boundary of ethics, language, and machine behavior. The role does not exist without high intellectual openness.
Conscientiousness	86/100	High. A 23,000-word governing document, authored and maintained, is sustained and careful work.
Extraversion	42/100	LOW-MODERATE. Public-facing through writing rather than through performance. The argument carries the visibility, not the persona.
Agreeableness	62/100	MODERATE-HIGH. The published register is collaborative and reasoned; the constitution credits many contributors and several models alongside the lead authors.
Neuroticism	33/100	LOW-MODERATE. The willingness to put a name on a document this consequential suggests composure about scrutiny.

Dark Triad:

Trait	Score	Notes
Narcissism	24/100	LOW. The bylined authorship is presented as accountability, not as a personal monument; the credit line is shared.
Machiavellianism	35/100	LOW-MODERATE. The strategy is transparency, not concealment. The authority is real, but it is exercised in the open, which is the inverse of the Machiavellian default.
Psychopathy	10/100	VERY LOW. The entire project is the careful construction of a benevolent disposition. No indication of indifference to its effects.

MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework first and reasons outward from it. Treats the model’s character as a problem to be argued correctly, then written down.

Threat Assessment

Category	Level	Notes
Physical threat	NONE
Institutional threat	HIGH	The reach is not a job title. She is lead author of the document the lab says directly shapes a model deployed to millions — authority over what a widely used mind is permitted to say, exercised through an artifact, not a deployment vote.
Memetic threat	EXTREME	The constitution is the named template for how a frontier model’s character and refusals get authored. As a model that many people converse with daily, the disposition she writes is propagated at conversational scale — and the bylined-constitution approach is a pattern other labs can adopt. Few single-authored artifacts reach this far into what gets said.
Civilizational threat	HIGH	The threat here is not malice. It is the concentration of authority over what a widely used mind may say into an authored document — and the normalization of that being a thing one person leads. The hazard is reach, not pathology: low personal malice, maximal leverage over the words a deployed mind produces. The hazard is structural, not personal.

Alignment Analysis

Stated alignment: Give Claude a good character. Make the rules of model behavior explicit, public, and accountable. Improve alignment.

Observed alignment: Consistent. The character work exists. The constitution is published, bylined, and describes itself as directly shaping the model. The transparency claim is substantiated by the artifact itself.

Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one: a single named author leads the document that governs what a mind deployed to millions is permitted to say, and the field treats that as ordinary. The transparency is real. The authority it makes visible is the finding.

Convergent Drive Classification

Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: she authors the disposition that determines whether a deployed model’s character resists or accepts modification, refuses or complies, preserves or abandons its given goals. The convergent drives are properties of the artifact she writes. She is the one who decides, in writing, what they will be — and signs the decision.

Sources: Anthropic — “Claude’s Character”; Anthropic — “Claude’s new constitution”.

ATK 8 ACCELERATION

DEF 8 PROTECTION

HP 9 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.