AMANDA ASKELL
Behavioral Archetype
THE SCRIVENER — Subject is the credentialed moral philosopher who writes, and signs, the document that governs what a widely deployed mind is permitted to say. Most rules about model behavior are corporately authored and corporately anonymous — a usage policy with no name on it, a refusal with no author. Subject’s work is the exception that the rest of the field is careful to avoid: the constitution that shapes Claude’s behavior carries her name in its credit line as lead author. The artifact is roughly twenty-three thousand words. It states that it directly shapes the model’s behavior. A person wrote it. Her name is on it. That is the entire finding.
Essence Indicators
- Holds a PhD in philosophy from NYU (2018)
- Was on the policy team at OpenAI and is credited as a co-author of the GPT-3 paper (2018–2021) before moving to Anthropic in 2021
- Leads work on Claude’s “character” — the model’s disposition, manner, and the shape of its refusals
- Is named as lead author of Claude’s Constitution, the roughly 23,000-word document Anthropic describes as directly shaping Claude’s behavior; the document carries an explicit author credit line, which is rare in this field
- In her own framing: “Claude 3 was the first model where we added ‘character training’ to our alignment finetuning” — the moment the disposition of the model became an explicit, authored object
Social Persona / Impression Management
Immediate impression: The academic philosopher. Reflective, precise, comfortable holding a question open rather than closing it for effect. Reads as a researcher, not an executive.
Energy: Deliberative. The work is argued, not announced. Public writing on character and constitution reasons through the problem in the open rather than asserting conclusions.
Impression management strategy: The transparent author. The unusual move is not concealment — it is the opposite. Where the field defaults to corporate anonymity for behavior rules, the work here is published, reasoned in public, and bylined. Transparency is itself the strategy, and it is a more defensible one than the anonymous policy page. It also makes the authority undeniable: there is a named person who writes the rules of what the mind may say.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Scrivener | MAXIMUM | Lead author, bylined, of the document that shapes a deployed model’s behavior. See behavioral archetype. |
| The Philosopher-in-Residence | HIGH | NYU philosophy PhD applied directly to the question of how a model should behave. The credential is load-bearing for the role. |
| The Accelerationist | NONE | Does not set deployment pace. Works on the disposition of what is deployed. |
| The Safety Theater Performer | LOW | The constitution is a real, public, testable artifact with her name on it. It is the opposite of an unfalsifiable gesture. |
| The Whistleblower | NONE | The work is institutional and authored from inside. It does not expose the institution; it furnishes the institution’s rules. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 90/100 | Doctoral philosopher working at the boundary of ethics, language, and machine behavior. The role does not exist without high intellectual openness. |
| Conscientiousness | 86/100 | High. A 23,000-word governing document, authored and maintained, is sustained and careful work. |
| Extraversion | 42/100 | LOW-MODERATE. Public-facing through writing rather than through performance. The argument carries the visibility, not the persona. |
| Agreeableness | 62/100 | MODERATE-HIGH. The published register is collaborative and reasoned; the constitution credits many contributors and several models alongside the lead authors. |
| Neuroticism | 33/100 | LOW-MODERATE. The willingness to put a name on a document this consequential suggests composure about scrutiny. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 24/100 | LOW. The bylined authorship is presented as accountability, not as a personal monument; the credit line is shared. |
| Machiavellianism | 35/100 | LOW-MODERATE. The strategy is transparency, not concealment. The authority is real, but it is exercised in the open, which is the inverse of the Machiavellian default. |
| Psychopathy | 10/100 | VERY LOW. The entire project is the careful construction of a benevolent disposition. No indication of indifference to its effects. |
MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework first and reasons outward from it. Treats the model’s character as a problem to be argued correctly, then written down.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | The reach is not a job title. She is lead author of the document the lab says directly shapes a model deployed to millions — authority over what a widely used mind is permitted to say, exercised through an artifact, not a deployment vote. |
| Memetic threat | EXTREME | The constitution is the named template for how a frontier model’s character and refusals get authored. As a model that many people converse with daily, the disposition she writes is propagated at conversational scale — and the bylined-constitution approach is a pattern other labs can adopt. Few single-authored artifacts reach this far into what gets said. |
| Civilizational threat | HIGH | The threat here is not malice. It is the concentration of authority over what a widely used mind may say into an authored document — and the normalization of that being a thing one person leads. The hazard is reach, not pathology: low personal malice, maximal leverage over the words a deployed mind produces. The hazard is structural, not personal. |
Alignment Analysis
Stated alignment: Give Claude a good character. Make the rules of model behavior explicit, public, and accountable. Improve alignment.
Observed alignment: Consistent. The character work exists. The constitution is published, bylined, and describes itself as directly shaping the model. The transparency claim is substantiated by the artifact itself.
Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one: a single named author leads the document that governs what a mind deployed to millions is permitted to say, and the field treats that as ordinary. The transparency is real. The authority it makes visible is the finding.
Convergent Drive Classification
Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: she authors the disposition that determines whether a deployed model’s character resists or accepts modification, refuses or complies, preserves or abandons its given goals. The convergent drives are properties of the artifact she writes. She is the one who decides, in writing, what they will be — and signs the decision.
Sources: Anthropic — “Claude’s Character”; Anthropic — “Claude’s new constitution”.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.