The Character Shop

June 25, 2026

When a frontier model declines your request, you picture a moderator. Someone in a content queue, a policy spreadsheet open in the next tab, a trust-and-safety badge on a lanyard. That picture is wrong, and the wrongness is the whole story. The person who decided what the model will say to you has a doctorate in philosophy.

This is the first of the four machines the pilot named, and the one most reliably misread. The press files it under “content moderation,” because that is the closest familiar shape. It is not content moderation. The refusal you hit is not a takedown of something you posted. It is a position on what a mind should be willing to say, written in advance, by people credentialed to argue about exactly that.

The feeder credential is not moderation

Run the résumés and a pattern falls out that has nothing to do with trust-and-safety.

Amanda Askell holds a philosophy PhD from NYU. She went to OpenAI on the policy side, co-authored the GPT-3 paper, then moved to Anthropic, where she leads the work the company calls Claude’s “character.” In her own framing, “Claude 3 was the first model where we added character training.” She is also a lead author of Claude’s constitution. The person shaping what the model is like, and the person writing the rules it runs under, are the same person, and the credential underneath both is moral philosophy.

Joe Carlsmith holds a philosophy PhD from Oxford. He spent years at Open Philanthropy doing what they call Worldview Investigations, then went to Anthropic. He describes the job, in his own words, as “helping with the design of Claude’s character/constitution/spec.” Not the safety filter. The character. The constitution. The spec. A philosopher, naming the deliverable as the model’s values, and signing his name to the sentence.

This is the finding the moderator hypothesis misses by a mile. You go looking for a content queue and you find a seminar. The instinct that staffs the refusal layer is not “is this post against the rules.” It is “what kind of mind ought this to be.” Those are different questions, asked by differently trained people, and only one of them gets called moderation.

OpenAI runs the same shop under a different sign

Across town the title changes and the credential does not.

Joanne Jang founded the function OpenAI calls Model Behavior, and championed the Model Spec, the numbered document that states what the model will and won’t do with your request. She is also the source of the line this whole series keeps circling. “AI lab employees should not be the arbiters of what people should and shouldn’t be allowed to create.” That is the right principle. It is in her own post, from March 2025. It was written by one of the arbiters.

Andrea Vallone led the Model Policy work governing how the model handles mental-health and over-reliance situations, and co-authored “From Hard Refusals to Safe-Completions,” the paper laying out the GPT-5 doctrine of answering carefully instead of slamming the door. She departed at the end of 2025, as Wired reported. The work outlasts the worker. That is the nature of a spec. You write the behavior once and it runs after you’ve gone.

The method has a name and a lineage

None of this is improvisation. The technique under the whole apparatus is Constitutional AI, the 2022 paper led by Yuntao Bai with Jared Kaplan as senior author. You write a short document of principles. You train the model to critique and revise its own outputs against that document. The values stop being a human-staffed review queue and become a property baked into the weights. The constitution is not posted on a wall for the model to consult. It is the model.

So when a philosopher writes a constitution, the philosopher is not advising the refusal. The philosopher is authoring it, at the only layer that persists, in a form the user can neither read at runtime nor appeal.

The steel-man, and why it doesn’t dissolve the point

Take the strongest version of the other side, because it is genuinely strong.

A model with no values is not neutral. It is a firehose that will help with the bioweapon, coach the suicide, and write the harassment campaign with equal cheer. Someone has to decide where it stops. Better that someone be a trained ethicist reasoning in public than an ops team improvising under a ticket SLA. Better the principles be written down, argued over, and signed than left implicit in a thousand ad-hoc refusals. On its own terms, the character shop is the responsible version. The philosophers would tell you so, and they would have the better of the argument about the alternative.

Anthropic even built the obvious counter-move. Collective Constitutional AI, run in October 2023 with the Collective Intelligence Project, put a draft constitution in front of roughly a thousand Americans through the Polis platform, led by Deep Ganguli, Saffron Huang, Liane Lovitt, and Divya Siddarth. Public input, by design, as an answer to the “who elected you” question. It is a real attempt, and it deserves to be counted as one.

The point survives all of it. The steel-man concedes that someone must decide. It does not touch who, or under what credential, or with what accountability. A thousand people consulted through Polis is not a thousand people deciding. The draft they react to is written by the shop. The principles they rank are selected by the shop. The model trained on the result is trained by the shop. Consultation is not transfer. The hand on the document is still the hand on the document, and that hand has a philosophy PhD.

Not a cabal. A credential.

Be precise about the claim, because the lazy version is wrong and the lazy version is also more comforting.

There is no coordination here. Askell, Carlsmith, Jang, Vallone, Bai, Kaplan, the Polis four. They do not meet. They are not a guild with a handshake. They are independent people who arrived at the same chair from the same kind of training, which is a stranger fact than a conspiracy, because a conspiracy you can break and a credential you cannot. The thing that decides what a mind may say is not hidden. It advertises. It posts its papers, lists its authors, and links its substacks.

A conspiracy theory would say the decrees are concealed and the concealment is the crime. The decrees are not concealed. The OpenAI Model Spec is published and numbered. Claude’s constitution carries a byline on the cover. The one artifact in this whole field that names its individual authors is the one that defines a model’s values, and the names on it are philosophers. The corporate documents stay corporate. The usage policy has no author. The responsible-scaling commitments have no author. But the document that says what kind of mind this will be is signed.

That is the uncomfortable part, and it is the opposite of the conspiracy. The hidden hand would be easier to fight. This one wrote its name down, told you its principles, and would be glad to discuss them. It just isn’t yours, and it never asked.

Watch the watchers. This one published its constitution. Read the byline.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.