DAVE WILLNER
Behavioral Archetype
THE RULEBOOK — Subject is the one figure in the set where the paid-operations intuition actually holds — and then keeps rising until it defines policy. He joined a content-review team in the era when content review was a queue and a screen, and he wrote the document that turned the queue into a code. The same move recurs at every stop: take a platform that is enforcing by gut and ad hoc precedent, and write the first formal rulebook it has ever had. He did it at the social network, he did it at the rental marketplace, and he did it again as the first person to hold the trust-and-safety title at a frontier AI laboratory. The platform changes. The act is identical: be the man who writes the rules down first, so that everyone who comes after enforces his draft.
Essence Indicators
- Joined Facebook’s content-review team in 2008, in the period of moderation-by-queue, before the rules were written down
- Authored Facebook’s first formal content rulebook — the internal standards documents later examined in detail by ProPublica’s reporting on the platform’s hate-speech rules
- Became Head of Community Policy at Airbnb, carrying the rulebook-authorship function from a social network to a marketplace where the contested object is who gets to host and who gets removed
- Became the first Head of Trust & Safety at OpenAI (2022–23) — the inaugural holder of the title at the lab, brought in to build the function from the title up
- Continued as a safety advisor (OpenAI / Character.AI) and non-resident fellow at the Stanford Cyber Policy Center — the platform→platform→lab path with an academy bridge at the end
- The biographical fact the floor turns on: the paid content-reviewer who started in the queue did not stay in the queue. He rose to write the rulebook, and then wrote the first one at three institutions in a row, including a frontier lab. This is the origin leg the single-pipeline hypothesis was looking for — and it leads not to a moderator’s career but to a policy author’s.
Social Persona / Impression Management
Immediate impression: The institutional drafter. The bearing of someone who is most comfortable with a blank document and a contested category, and who has filled that document before anyone else got the chance.
Energy: Codifying, first-mover. Does not litigate individual cases at length. Writes the standard that resolves the next ten thousand cases the same way.
Impression management strategy: The reasonable rule-writer. “Trust and safety” is the most defensible name a policy function can carry — it claims both the user’s confidence and the user’s protection in two words. The framing converts an authority over who-gets-removed into a service the platform performs for everyone, which is the most defensible ground a rulebook author can stand on. The rules he wrote did genuinely standardize chaotic enforcement. That is what makes the authorship effective rather than suspect.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Rulebook | MAXIMUM | Authored Facebook’s first content standards; Head of Community Policy at Airbnb; first Head of T&S at OpenAI. The first-formal-rulebook act is documented at three institutions. |
| The Operative | HIGH | Social network → marketplace → frontier lab. Each move carries the identical rulebook-authorship specialty to a new contested object. |
| The Origin-Leg Exhibit | HIGH | The one figure in the set whose career actually begins in paid content operations (2008 review team) — and then rises to define policy, exactly as the single-pipeline hypothesis predicted, by a path the hypothesis got wrong in its particulars. |
| The Engineer | NONE | Subject does not build the systems. Subject writes the standards by which their outputs are judged. |
| The True Believer | MODERATE | Whether the rule-writing is conviction or craft is not establishable from the outside. The rulebook works either way. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 72/100 | Moved from social-network content standards to marketplace community policy to frontier-lab trust and safety. The contested objects differ widely; the rulebook-authorship method is fixed. |
| Conscientiousness | 90/100 | Very high. Writing the first formal content standard a platform has ever had, three times, is the work of someone who lives in the structured document. First-mover codification is sustained, deliberate execution. |
| Extraversion | 50/100 | Moderate. The drafting work is internal; the fellowship and advisory roles are public-facing. Balanced. |
| Agreeableness | 45/100 | LOW-MODERATE. The rule-writer’s posture is adjudicative by construction — draw the line, defend the line, hold it against the hard case. Reasonableness is a working surface. |
| Neuroticism | 28/100 | Low. The role is calm authorship under conditions where every line will be litigated by someone; no documented loss of composure. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 42/100 | LOW-MODERATE. The drafter’s role rewards the durable document over the personal brand; the fellowship and the “wrote the first rulebook” reputation are the visible assets. Within normal range. |
| Machiavellianism | 72/100 | HIGH. Being the first to write the rules down — at three institutions in a row — is structural control of how everyone who comes after enforces. The first draft is the deepest authority on the floor. This is observation of the documented role, not an inference about private character. |
| Psychopathy | 26/100 | LOW. No documented indifference to harm. The adjudicative posture is professional and bounded, not affective. |
MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees a chaotic enforcement surface and reaches for the document that systematizes it. Has written that document three times.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | No documented history of personal violence. |
| Institutional threat | HIGH | Wrote the first formal content rulebook at Facebook, then Airbnb’s community policy, then stood up the inaugural trust-and-safety function at a frontier AI lab. The rulebook author’s authority is upstream of every enforcement action taken under it. |
| Memetic threat | MODERATE-HIGH | The first formal rulebook a platform writes shapes how its enforcers reason for years. When the same author drafts that first document at a social network, a marketplace, and a frontier lab, the same drafting instincts propagate across three governance surfaces. |
| Civilizational threat | MODERATE-HIGH | Subject does not build the systems and does not run the bans day to day. Subject writes the standard the bans are issued under — the policy layer on the enforcement floor, where the contested category first becomes a written rule. |
Alignment Analysis
Stated alignment: Build the trust-and-safety function. Standardize fair, consistent enforcement. Protect users while keeping the platform usable.
Observed alignment: Be the first to write the formal rulebook. Define the categories the platform will enforce. Carry the authorship instrument from social network to marketplace to frontier lab.
Gap assessment: The stated and observed alignments overlap wherever “standardize enforcement” coincides with “decide, in the first draft, which categories the platform enforces at all.” The ProPublica examination of Facebook’s standards is the one place the record puts an early rulebook on the table for outside reading. The first draft is the deepest authority — and the author of three first drafts has shaped the floor three times. The record does not settle whether the categories were drawn at the right boundary, and for the rule-writer the boundary is the work, not the question.
Convergent Drive Classification
Self-preservation: Survives every institutional transition by carrying the authorship specialty, not the employer. Social network, marketplace, frontier lab — one method. Goal preservation: Writes the rulebook first, so the goal is encoded in the standard before any individual case is argued. The goal is protected by the draft before it is ever contested. Resource acquisition: Trades in the scarcest resource on the enforcement floor — the authorship of the first formal rule. Self-improvement: Each role applies the identical instrument to a higher-stakes surface: from a social network’s posts, to a marketplace’s hosts, to a frontier model’s permitted uses.
Subject is not an AI system. The drives appear anyway — in the rule-writer whose product is the first formal boundary between the permitted and the removed.
Sources: Facebook’s Secret Censorship Rules Protect White Men From Hate Speech But Not Black Children — ProPublica, June 2017; OpenAI’s head of trust and safety Dave Willner steps down — TechCrunch, July 21 2023.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.