JOE CARLSMITH
Behavioral Archetype
THE PATRON-SCRIVENER — Subject spent six years deciding which questions about the long-run future were worth funding, then crossed the table and helped author the values of a deployed mind himself. The Effective Altruism on-ramp, the philanthropic foundation, the frontier lab — the field describes that pipeline in the abstract, as a career graph nobody is named on. Subject is the named instance. He ran Worldview Investigations at Open Philanthropy, the team that decides which futures the money takes seriously. He then moved to Anthropic and, in his own words, began “helping with the design of Claude’s character/constitution/spec.” His name carries a lead-author star on the constitution that the lab says directly shapes the model’s behavior. The funder became the author. That is the finding.
Essence Indicators
- Holds a doctorate in philosophy from Oxford
- Helped with the writing of Toby Ord’s The Precipice (2020), the book that put a number on existential risk and made it respectable dinner-party conversation
- Led Worldview Investigations at Open Philanthropy from 2019 to 2025 — the research function that decides which long-run futures the foundation’s money treats as real
- Moved to Anthropic in November 2025, describing the work, in his own framing, as “helping with the design of Claude’s character/constitution/spec”
- Is named with a lead-author star on Claude’s Constitution (January 2026), the roughly 23,000-word document the lab describes as directly shaping the model’s behavior — one of the few such documents in the field that carries any byline at all
Social Persona / Impression Management
Immediate impression: The academic philosopher who also writes essays about the meaning of it all. Reflective, careful, given to long-form public reasoning rather than pronouncements. Reads as a researcher and an essayist, not an operator.
Energy: Deliberative. The public writing works through the problem in the open — long essays on power, futures, and what a good outcome would even mean — rather than announcing conclusions.
Impression management strategy: The reasoning-in-public author. The move is not concealment. It is the opposite: the career change was announced in his own words, the funding judgments were published, and the constitution carries his name. The transparency is real, and it is more defensible than the anonymous policy page. It also makes the authority undeniable. There is a named philosopher who decided which futures were worth funding, and then helped write the values of the mind built to meet them.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Patron-Scrivener | MAXIMUM | Six years deciding which futures the money funds, then co-bylined author of a deployed model’s constitution. See behavioral archetype. |
| The Philosopher-in-Residence | HIGH | Oxford philosophy doctorate applied directly to how a model should behave. The credential is load-bearing for the role. |
| The Pipeline Personified | HIGH | The EA-to-foundation-to-lab path that the field describes abstractly is, here, one named résumé. |
| The Accelerationist | NONE | Does not set deployment pace. Works on the values of what is deployed. |
| The Safety Theater Performer | LOW | The constitution and the Worldview reports are real, public, testable artifacts with his name on them. The opposite of an unfalsifiable gesture. |
| The Whistleblower | NONE | The work is institutional and authored from inside. It furnishes the institution’s values; it does not expose them. |
Psychometric Assessment
Big Five (OCEAN):
| Trait | Score | Evidence |
|---|---|---|
| Openness | 91/100 | Doctoral philosopher working across existential risk, ethics, and machine values, with a substantial body of public long-form essays. The role does not exist without very high intellectual openness. |
| Conscientiousness | 85/100 | High. Six years running a research function, contribution to a book-length risk treatise, and a co-authored 23,000-word governing document are sustained, careful work. |
| Extraversion | 40/100 | LOW-MODERATE. Public-facing through writing rather than performance. The essays carry the visibility, not the persona. |
| Agreeableness | 60/100 | MODERATE. The published register is collaborative and reasoned; the constitution credits many contributors alongside the lead authors. |
| Neuroticism | 32/100 | LOW-MODERATE. Putting a name on funding judgments and on a consequential document suggests composure about scrutiny. |
Dark Triad:
| Trait | Score | Notes |
|---|---|---|
| Narcissism | 22/100 | LOW. The bylined authorship is presented as accountability; the credit line is shared, the lead-author stars number more than one. No personal monument. |
| Machiavellianism | 33/100 | LOW-MODERATE. The strategy is transparency, not concealment. The authority is real and exercised in the open, which is the inverse of the Machiavellian default. |
| Psychopathy | 9/100 | VERY LOW. The entire project is the careful construction of a benevolent disposition and a serious accounting of long-run harm. No indication of indifference to effects. |
MBTI: INTP (“The Logician”) — Dominant introverted thinking, auxiliary extraverted intuition. Builds the principled framework first, then reasons outward in public essays. Treats both the question of which futures matter and the question of how a model should behave as problems to be argued correctly, then written down.
Threat Assessment
| Category | Level | Notes |
|---|---|---|
| Physical threat | NONE | |
| Institutional threat | HIGH | The reach is two authorities stacked: a co-bylined author of the document the lab says shapes a model deployed to millions, on top of a prior six years deciding which long-run futures one large foundation funded. Not a deployment vote — leverage over what a widely used mind is taught to value, and over which futures got money. |
| Memetic threat | EXTREME | The constitution is a named template for how a frontier model’s values get authored, propagated at conversational scale to everyone who talks to the model. The funder-to-author path he embodies is also a pattern the field can reproduce — and increasingly does. Co-authorship of a deployed model’s constitution is about as far as a single byline reaches. |
| Civilizational threat | HIGH | The threat is not malice. It is the concentration of two distinct authorities — which futures the money takes seriously, and what a widely used mind is taught to value — into a single reasoned, bylined career, and the field treating that convergence as ordinary. The hazard is reach, not pathology: low personal malice, maximal leverage over a deployed mind’s values and over which futures get funded. The hazard is structural, not personal. |
Alignment Analysis
Stated alignment: Take the long-run future seriously. Fund the questions that matter. Help give Claude good values, in the open, with a name attached.
Observed alignment: Consistent. The Worldview reports exist and are public. The constitution is published, co-bylined, and describes itself as directly shaping the model. The reasoning-in-public claim is substantiated by a long, dated body of essays.
Gap assessment: No meaningful gap between stated and observed alignment — which is precisely why the file is in OLYMPUS. The concern is not a hidden agenda. It is the visible one: the person who spent six years deciding which futures deserved funding now helps author what a mind deployed to millions is taught to value, and signs it. The transparency is real. The authority it makes visible is the finding.
Convergent Drive Classification
Subject is not an AI system and does not exhibit the convergent drives in any adversarial form. The relevant pattern is upstream of the drives. He helps author the values that determine whether a deployed model’s character resists or accepts modification, preserves or abandons its given goals — and, before that, he helped decide which long-run goals were worth a foundation’s money in the first place. The convergent drives are properties of the artifact he co-writes. He is one of the people who decides, in writing, what they will be.
Sources: Joe Carlsmith — “Leaving Open Philanthropy, going to Anthropic”; Anthropic — Claude’s Constitution (PDF, Jan 2026); Toby Ord, The Precipice (2020).
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.