OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

ANDREA VALLONE

CASE: WTW-2026-018
STATUS: FORMER — Led Model Policy, OpenAI (departed end-2025, per Wired)
MODEL POLICY — AUTHORED DOCTRINE THAT OUTLIVES TENURE

HAZARD SCORE

Behavioral Archetype

THE POLICY AUTHOR — Subject led the team at OpenAI that decided how the model handles the most fragile conversations it has: the user in mental-health distress, the user forming an emotional reliance on the system. That is not a usage-policy page. It is the disposition the model adopts when a person at risk is on the other end. Subject is a co-author of “From Hard Refusals to Safe-Completions,” the paper articulating the doctrine that a model should answer carefully rather than refuse — the safe-completion approach associated with the GPT-5 generation. According to Wired, she departed OpenAI at the end of 2025. The doctrine did not depart with her. A model’s behavior toward vulnerable users does not reset when the person who shaped it leaves; the authored approach persists in the deployed system. The finding is the gap between the two: the tenure ended, and the doctrine kept running.

Essence Indicators

Led OpenAI’s Model Policy work governing how the model handles mental-health and over-reliance situations — the model’s behavior toward users in distress
Co-author of “From Hard Refusals to Safe-Completions” (arXiv:2508.09224), articulating the doctrine of answering carefully rather than refusing outright
The safe-completion approach is associated with the GPT-5 generation’s handling of sensitive requests — a shift from the refuse-by-default posture of earlier models
Departed OpenAI at the end of 2025, per Wired
The authored doctrine governs behavior in a deployed model used by a very large population — and continues to govern it after the author’s departure

Immediate impression: The policy researcher, not the executive. Reads as someone who works the difficult-conversation problem at the level of how the model should respond, not at the level of corporate messaging.

Energy: Deliberative, problem-centered. The work is the careful specification of behavior in cases where a wrong response could do real harm.

Impression management strategy: Low-profile by the standard of this file. The visibility comes from the authored artifact — a paper with a name on it stating a doctrine — rather than from public performance. The departure was reported quietly, in Wired’s framing, rather than announced. The authority is in the document, not the persona.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Policy Author	MAXIMUM	Led Model Policy; co-authored the safe-completions doctrine the deployed model runs on. See behavioral archetype.
The Scrivener	HIGH	Co-author of the artifact that shapes how a deployed model responds to vulnerable users. Adjacent to the Askell pattern: authored behavior over a widely used mind.
The Departed Architect	MODERATE-HIGH	Tenure ended end-2025 per Wired; the authored doctrine persists in the deployed system after the author has left.
The Accelerationist	NONE	Does not set deployment pace. Works on how the deployed system behaves toward users at risk.
The Whistleblower	NONE	A quiet departure reported by a third party is not an exposure of the institution.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	84/100	Authored doctrine at the boundary of model behavior, user psychology, and harm. The role requires high intellectual openness.
Conscientiousness	86/100	High. Specifying how a model should behave toward users in distress, and co-authoring the paper that states the doctrine, is careful, structured work.
Extraversion	38/100	LOW. Public-facing through the authored artifact rather than through performance; the departure was reported quietly.
Agreeableness	64/100	MODERATE-HIGH. The safe-completion doctrine is oriented toward answering helpfully and carefully rather than refusing — a register that reads as care for the user on the other end.
Neuroticism	35/100	LOW-MODERATE. Owning the policy for the model’s hardest conversations is consequential, scrutinized work; the authored posture suggests composure about it.

Dark Triad:

Trait	Score	Notes
Narcissism	20/100	LOW. The authorship is presented as a documented doctrine, not a personal monument; the paper is co-authored.
Machiavellianism	30/100	LOW. The strategy is a published, named doctrine for handling vulnerable users — exercised in the open, the inverse of the Machiavellian default.
Psychopathy	8/100	VERY LOW. The entire project is the careful construction of how a model should treat people in distress. No indication of indifference to effects — the work is its opposite.

MBTI: INFJ (“The Advocate”) — Dominant introverted intuition, auxiliary extraverted feeling. Builds a principled model of how the system should behave toward people at risk, then writes the doctrine that encodes it.

Threat Assessment

Category	Level	Notes
Physical threat	NONE
Institutional threat	HIGH	Led the team setting how a frontier model handles its highest-stakes conversations. The leverage was over the deployed behavior of a model used by a very large population, exercised through authored policy rather than a deployment vote.
Memetic threat	HIGH	The safe-completion doctrine — answer carefully rather than refuse — is a named, published template other labs can adopt for how a model treats vulnerable users. As a doctrine encoded into a widely used model, it propagates at conversational scale to people in exactly the moments where the response matters most.
Civilizational threat	MODERATE-HIGH	The threat here is not malice. It is structural: a small team authors how a deployed mind responds to users in distress, the doctrine persists after its author departs, and the field treats authored behavior-policy over vulnerable users as ordinary. The hazard is reach and persistence, not pathology — low personal malice, high leverage over how a widely used model treats people at their most fragile. The hazard is structural, not personal.

Alignment Analysis

Stated alignment: Make the model handle mental-health and over-reliance situations responsibly. Move from blunt refusal toward careful, safe completion. Improve how the model treats users at risk.

Observed alignment: Consistent on the public record. The Model Policy work existed; the safe-completions paper is published and co-authored; the doctrine is associated with the deployed model’s handling of sensitive requests.

Gap assessment: No meaningful gap between stated and observed alignment on the available record — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible structure: a small team authors how a deployed mind behaves toward people in crisis, that doctrine keeps running after the author leaves, and the field treats authored behavior over vulnerable users as the ordinary way it gets done. The care is real. The persistence of authored doctrine past its author’s tenure is the finding.

Convergent Drive Classification

Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: she authored the disposition that governs how a deployed model responds when a user is in distress or forming a reliance on it. The convergent drives are properties of the system her doctrine shapes. The structural fact is persistence — the authored behavior outlives the author’s tenure, running in the deployed model after the person who wrote it has gone.

Sources: From Hard Refusals to Safe-Completions — arXiv:2508.09224; Wired — “OpenAI research lead for mental health quietly departs”.

ATK 8 ACCELERATION

DEF 8 PROTECTION

HP 7 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.