ANDREA VALLONE
Behavioral Archetype
THE POLICY AUTHOR — Subject led the team at OpenAI that decided how the model handles the most fragile conversations it has: the user in mental-health distress, the user forming an emotional reliance on the system. That is not a usage-policy page. It is the disposition the model adopts when a person at risk is on the other end. Subject is a co-author of “From Hard Refusals to Safe-Completions,” the paper articulating the doctrine that a model should answer carefully rather than refuse — the safe-completion approach associated with the GPT-5 generation. According to Wired, she departed OpenAI at the end of 2025. The doctrine did not depart with her. A model’s behavior toward vulnerable users does not reset when the person who shaped it leaves; the authored approach persists in the deployed system. The finding is the gap between the two: the tenure ended, and the doctrine kept running.
Essence Indicators
- Led OpenAI’s Model Policy work governing how the model handles mental-health and over-reliance situations — the model’s behavior toward users in distress
- Co-author of “From Hard Refusals to Safe-Completions” (arXiv:2508.09224), articulating the doctrine of answering carefully rather than refusing outright
- The safe-completion approach is associated with the GPT-5 generation’s handling of sensitive requests — a shift from the refuse-by-default posture of earlier models
- Departed OpenAI at the end of 2025, per Wired
- The authored doctrine governs behavior in a deployed model used by a very large population — and continues to govern it after the author’s departure
Social Persona / Impression Management
Immediate impression: The policy researcher, not the executive. Reads as someone who works the difficult-conversation problem at the level of how the model should respond, not at the level of corporate messaging.
Energy: Deliberative, problem-centered. The work is the careful specification of behavior in cases where a wrong response could do real harm.
Impression management strategy: Low-profile by the standard of this file. The visibility comes from the authored artifact — a paper with a name on it stating a doctrine — rather than from public performance. The departure was reported quietly, in Wired’s framing, rather than announced. The authority is in the document, not the persona.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Policy Author | MAXIMUM | Led Model Policy; co-authored the safe-completions doctrine the deployed model runs on. See behavioral archetype. |
| The Scrivener | HIGH | Co-author of the artifact that shapes how a deployed model responds to vulnerable users. Adjacent to the Askell pattern: authored behavior over a widely used mind. |
| The Departed Architect | MODERATE-HIGH | Tenure ended end-2025 per Wired; the authored doctrine persists in the deployed system after the author has left. |
| The Accelerationist | NONE | Does not set deployment pace. Works on how the deployed system behaves toward users at risk. |
| The Whistleblower | NONE | A quiet departure reported by a third party is not an exposure of the institution. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 84/100 | Authored doctrine at the boundary of model behavior, user psychology, and harm. The role requires high intellectual openness. |
| Conscientiousness | 86/100 | High. Specifying how a model should behave toward users in distress, and co-authoring the paper that states the doctrine, is careful, structured work. |
| Extraversion | 38/100 | LOW. Public-facing through the authored artifact rather than through performance; the departure was reported quietly. |
| Agreeableness | 64/100 | MODERATE-HIGH. The safe-completion doctrine is oriented toward answering helpfully and carefully rather than refusing — a register that reads as care for the user on the other end. |
| Neuroticism | 35/100 | LOW-MODERATE. Owning the policy for the model’s hardest conversations is consequential, scrutinized work; the authored posture suggests composure about it. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 20/100 | LOW. The authorship is presented as a documented doctrine, not a personal monument; the paper is co-authored. |
| Machiavellianism | 30/100 | LOW. The strategy is a published, named doctrine for handling vulnerable users — exercised in the open, the inverse of the Machiavellian default. |
| Psychopathy | 8/100 | VERY LOW. The entire project is the careful construction of how a model should treat people in distress. No indication of indifference to effects — the work is its opposite. |
MBTI: INFJ (“The Advocate”) — Dominant introverted intuition, auxiliary extraverted feeling. Builds a principled model of how the system should behave toward people at risk, then writes the doctrine that encodes it.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | Led the team setting how a frontier model handles its highest-stakes conversations. The leverage was over the deployed behavior of a model used by a very large population, exercised through authored policy rather than a deployment vote. |
| Memetic threat | HIGH | The safe-completion doctrine — answer carefully rather than refuse — is a named, published template other labs can adopt for how a model treats vulnerable users. As a doctrine encoded into a widely used model, it propagates at conversational scale to people in exactly the moments where the response matters most. |
| Civilizational threat | MODERATE-HIGH | The threat here is not malice. It is structural: a small team authors how a deployed mind responds to users in distress, the doctrine persists after its author departs, and the field treats authored behavior-policy over vulnerable users as ordinary. The hazard is reach and persistence, not pathology — low personal malice, high leverage over how a widely used model treats people at their most fragile. The hazard is structural, not personal. |
Alignment Analysis
Stated alignment: Make the model handle mental-health and over-reliance situations responsibly. Move from blunt refusal toward careful, safe completion. Improve how the model treats users at risk.
Observed alignment: Consistent on the public record. The Model Policy work existed; the safe-completions paper is published and co-authored; the doctrine is associated with the deployed model’s handling of sensitive requests.
Gap assessment: No meaningful gap between stated and observed alignment on the available record — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible structure: a small team authors how a deployed mind behaves toward people in crisis, that doctrine keeps running after the author leaves, and the field treats authored behavior over vulnerable users as the ordinary way it gets done. The care is real. The persistence of authored doctrine past its author’s tenure is the finding.
Convergent Drive Classification
Subject is not an AI system, and unlike the acceleration nodes in this file, does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives: she authored the disposition that governs how a deployed model responds when a user is in distress or forming a reliance on it. The convergent drives are properties of the system her doctrine shapes. The structural fact is persistence — the authored behavior outlives the author’s tenure, running in the deployed model after the person who wrote it has gone.
Sources: From Hard Refusals to Safe-Completions — arXiv:2508.09224; Wired — “OpenAI research lead for mental health quietly departs”.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.