JOANNE JANG
Behavioral Archetype
THE HONEST ARBITER — Subject built the function inside a frontier lab that decides how the model behaves, championed the public document that codifies it, and then wrote the sentence that names the whole arrangement’s central problem out loud. Most people in this seat defend the position. Subject described it: “AI lab employees should not be the arbiters of what people should and shouldn’t be allowed to create.” She wrote that in her own newsletter, about a specific product decision, while occupying exactly the chair the sentence indicts. She did not resign over the contradiction. She named it, kept the chair, and shipped the spec. The finding is not hypocrisy. The finding is that the most honest description of the arbiter’s power was written by the arbiter.
Essence Indicators
- Holds a computer science degree from Stanford
- Founded and led OpenAI’s “Model Behavior” function — the team responsible for how the model acts, refuses, and declines, as a discipline distinct from raw capability
- Championed the Model Spec, OpenAI’s public document setting out the intended behavior of its models — the rules of the model’s conduct, written down and published
- Wrote, in her own newsletter in March 2025, the line that names the arrangement directly: “AI lab employees should not be the arbiters of what people should and shouldn’t be allowed to create” — published while she held the function that does exactly that
- Left OpenAI in September 2025 to found OAI Labs; the Model Behavior function was folded into Post-Training
Social Persona / Impression Management
Immediate impression: The reflective practitioner. Writes a personal newsletter that reasons through hard product-policy questions in public, in the first person, with the doubts left in. Reads as a builder thinking out loud, not a spokesperson reading talking points.
Energy: Deliberative and candid. The public writing surfaces the tensions of the role rather than smoothing them — including the tension that the role itself may be illegitimate.
Impression management strategy: The honest insider. The unusual move is the candor. Where the field defaults to defending its discretion or hiding it behind a corporate policy page, the writing here states the strongest objection to that discretion — and then continues to exercise it. The transparency is real and disarming. It also makes the authority undeniable: the clearest case against the arbiter’s power is the one the arbiter published, from the chair, without leaving it.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Honest Arbiter | MAXIMUM | Founded the model-behavior function, championed the spec, published the sharpest objection to her own discretion, kept the seat. See behavioral archetype. |
| The Architect-of-Conduct | HIGH | Built the discipline of model behavior as a named function and the public spec that codifies it. |
| The Reluctant Sovereign | MODERATE | Names the illegitimacy of the arbiter’s power in writing, then exercises it anyway. The naming is genuine; the chair stays occupied. |
| The Accelerationist | NONE | Works on how the model behaves, not on the pace at which it ships. |
| The Safety Theater Performer | LOW | The Model Spec is a real, public, testable document; the newsletter states real objections. The opposite of an unfalsifiable gesture. |
| The Whistleblower | NONE | The candor is published from inside the role, not against it. She named the problem and stayed; she did not expose the institution. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 84/100 | Stanford CS founder of a new behavioral discipline, writing reflective public essays on the philosophy of model conduct. High intellectual openness. |
| Conscientiousness | 83/100 | High. Founding a function, championing a published spec, and maintaining a consistent body of reasoned public writing is sustained, careful work. |
| Extraversion | 50/100 | MODERATE. Public through a first-person newsletter rather than performance; visible, but the writing carries it, not the persona. |
| Agreeableness | 58/100 | MODERATE. Collaborative and candid register; willing to publish a position that complicates her own institutional standing. |
| Neuroticism | 35/100 | LOW-MODERATE. Publishing the strongest objection to one’s own role, by name, suggests composure about scrutiny. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 23/100 | LOW. The public writing is self-questioning, not self-monumentalizing; the signature move is naming a limit on her own authority. |
| Machiavellianism | 34/100 | LOW-MODERATE. The strategy is candor, not concealment. The discretion is real, but stated openly — the inverse of the Machiavellian default. The unresolved tension is that the candor coexists with keeping the power. |
| Psychopathy | 9/100 | VERY LOW. The work is the careful design of how a model declines and behaves, reasoned in public with the doubts intact. No indication of indifference to effects. |
MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Builds the behavioral framework as a system, codifies it in a public spec, and reasons about its legitimacy in the open. Treats the model’s conduct as a structure to be designed correctly, then written down.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | The reach is the function itself: she founded the discipline that decided how a frontier model behaved and championed the public document — the Spec — that codifies it. Not a deployment vote; authorship of the rules of conduct for a mind deployed to millions, now carried into a new venture. |
| Memetic threat | EXTREME | The Model Spec is a named template for how a frontier model’s conduct gets written and published. As behavior propagated to everyone who uses the model, the discipline she founded is exercised at conversational scale — and the spec-and-model-behavior approach is the pattern other labs adopt. Founding the discipline puts the reach about as high as a single author’s gets. |
| Civilizational threat | HIGH | The threat is not malice. It is the concentration of authority over what a widely used mind may say and do into a small set of lab employees — a concentration she herself named as illegitimate, in writing, and then continued to hold. The hazard is reach, not pathology: low personal malice, maximal leverage over what a deployed mind says and refuses. The hazard is structural, not personal. The candor sharpens the structural point rather than softening it. |
Alignment Analysis
Stated alignment: Make model behavior a real discipline. Write the rules of conduct down, publish them, and reason about them honestly — including the part where lab employees should not be the arbiters.
Observed alignment: Consistent. The Model Behavior function existed. The Model Spec is public. The newsletter is dated, first-person, and states the objection plainly. The candor claim is substantiated by the writing itself.
Gap assessment: No meaningful gap between stated and observed alignment — which is why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one, and she stated it more sharply than any outside critic: lab employees should not be the arbiters of what people may create, and lab employees, including her, were. The transparency is real. The authority it makes visible — and that she named and kept — is the finding.
Convergent Drive Classification
Subject is not an AI system and does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives. She founded the function and championed the document that determine how a deployed model behaves — what it refuses, what it permits, how it declines. The convergent drives are properties of the model whose conduct the spec governs. She is one of the people who decided, in writing, what that conduct would be — while being the one who wrote down why no one in that chair should get to decide it alone.
Sources: Joanne Jang — “Thoughts on setting policy for new AI capabilities,” Reservoir Samples (Mar 27 2025); OpenAI — Model Spec (2025-12-18).
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.