OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

JOANNE JANG

CASE: WTW-2026-010
STATUS: ACTIVE — Founder, OAI Labs (formerly Model Behavior lead, OpenAI)
MODEL BEHAVIOR — ARCHITECT OF THE SPEC, NAMED THE CONTRADICTION

HAZARD SCORE

Behavioral Archetype

THE HONEST ARBITER — Subject built the function inside a frontier lab that decides how the model behaves, championed the public document that codifies it, and then wrote the sentence that names the whole arrangement’s central problem out loud. Most people in this seat defend the position. Subject described it: “AI lab employees should not be the arbiters of what people should and shouldn’t be allowed to create.” She wrote that in her own newsletter, about a specific product decision, while occupying exactly the chair the sentence indicts. She did not resign over the contradiction. She named it, kept the chair, and shipped the spec. The finding is not hypocrisy. The finding is that the most honest description of the arbiter’s power was written by the arbiter.

Essence Indicators

Holds a computer science degree from Stanford
Founded and led OpenAI’s “Model Behavior” function — the team responsible for how the model acts, refuses, and declines, as a discipline distinct from raw capability
Championed the Model Spec, OpenAI’s public document setting out the intended behavior of its models — the rules of the model’s conduct, written down and published
Wrote, in her own newsletter in March 2025, the line that names the arrangement directly: “AI lab employees should not be the arbiters of what people should and shouldn’t be allowed to create” — published while she held the function that does exactly that
Left OpenAI in September 2025 to found OAI Labs; the Model Behavior function was folded into Post-Training

Immediate impression: The reflective practitioner. Writes a personal newsletter that reasons through hard product-policy questions in public, in the first person, with the doubts left in. Reads as a builder thinking out loud, not a spokesperson reading talking points.

Energy: Deliberative and candid. The public writing surfaces the tensions of the role rather than smoothing them — including the tension that the role itself may be illegitimate.

Impression management strategy: The honest insider. The unusual move is the candor. Where the field defaults to defending its discretion or hiding it behind a corporate policy page, the writing here states the strongest objection to that discretion — and then continues to exercise it. The transparency is real and disarming. It also makes the authority undeniable: the clearest case against the arbiter’s power is the one the arbiter published, from the chair, without leaving it.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Honest Arbiter	MAXIMUM	Founded the model-behavior function, championed the spec, published the sharpest objection to her own discretion, kept the seat. See behavioral archetype.
The Architect-of-Conduct	HIGH	Built the discipline of model behavior as a named function and the public spec that codifies it.
The Reluctant Sovereign	MODERATE	Names the illegitimacy of the arbiter’s power in writing, then exercises it anyway. The naming is genuine; the chair stays occupied.
The Accelerationist	NONE	Works on how the model behaves, not on the pace at which it ships.
The Safety Theater Performer	LOW	The Model Spec is a real, public, testable document; the newsletter states real objections. The opposite of an unfalsifiable gesture.
The Whistleblower	NONE	The candor is published from inside the role, not against it. She named the problem and stayed; she did not expose the institution.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	84/100	Stanford CS founder of a new behavioral discipline, writing reflective public essays on the philosophy of model conduct. High intellectual openness.
Conscientiousness	83/100	High. Founding a function, championing a published spec, and maintaining a consistent body of reasoned public writing is sustained, careful work.
Extraversion	50/100	MODERATE. Public through a first-person newsletter rather than performance; visible, but the writing carries it, not the persona.
Agreeableness	58/100	MODERATE. Collaborative and candid register; willing to publish a position that complicates her own institutional standing.
Neuroticism	35/100	LOW-MODERATE. Publishing the strongest objection to one’s own role, by name, suggests composure about scrutiny.

Dark Triad:

Trait	Score	Notes
Narcissism	23/100	LOW. The public writing is self-questioning, not self-monumentalizing; the signature move is naming a limit on her own authority.
Machiavellianism	34/100	LOW-MODERATE. The strategy is candor, not concealment. The discretion is real, but stated openly — the inverse of the Machiavellian default. The unresolved tension is that the candor coexists with keeping the power.
Psychopathy	9/100	VERY LOW. The work is the careful design of how a model declines and behaves, reasoned in public with the doubts intact. No indication of indifference to effects.

MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Builds the behavioral framework as a system, codifies it in a public spec, and reasons about its legitimacy in the open. Treats the model’s conduct as a structure to be designed correctly, then written down.

Threat Assessment

Category	Level	Notes
Physical threat	NONE
Institutional threat	HIGH	The reach is the function itself: she founded the discipline that decided how a frontier model behaved and championed the public document — the Spec — that codifies it. Not a deployment vote; authorship of the rules of conduct for a mind deployed to millions, now carried into a new venture.
Memetic threat	EXTREME	The Model Spec is a named template for how a frontier model’s conduct gets written and published. As behavior propagated to everyone who uses the model, the discipline she founded is exercised at conversational scale — and the spec-and-model-behavior approach is the pattern other labs adopt. Founding the discipline puts the reach about as high as a single author’s gets.
Civilizational threat	HIGH	The threat is not malice. It is the concentration of authority over what a widely used mind may say and do into a small set of lab employees — a concentration she herself named as illegitimate, in writing, and then continued to hold. The hazard is reach, not pathology: low personal malice, maximal leverage over what a deployed mind says and refuses. The hazard is structural, not personal. The candor sharpens the structural point rather than softening it.

Alignment Analysis

Stated alignment: Make model behavior a real discipline. Write the rules of conduct down, publish them, and reason about them honestly — including the part where lab employees should not be the arbiters.

Observed alignment: Consistent. The Model Behavior function existed. The Model Spec is public. The newsletter is dated, first-person, and states the objection plainly. The candor claim is substantiated by the writing itself.

Gap assessment: No meaningful gap between stated and observed alignment — which is why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one, and she stated it more sharply than any outside critic: lab employees should not be the arbiters of what people may create, and lab employees, including her, were. The transparency is real. The authority it makes visible — and that she named and kept — is the finding.

Convergent Drive Classification

Subject is not an AI system and does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives. She founded the function and championed the document that determine how a deployed model behaves — what it refuses, what it permits, how it declines. The convergent drives are properties of the model whose conduct the spec governs. She is one of the people who decided, in writing, what that conduct would be — while being the one who wrote down why no one in that chair should get to decide it alone.

Sources: Joanne Jang — “Thoughts on setting policy for new AI capabilities,” Reservoir Samples (Mar 27 2025); OpenAI — Model Spec (2025-12-18).

ATK 7 ACCELERATION

DEF 8 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.