OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

J. ZICO KOLTER

CASE: WTW-2026-044
STATUS: ACTIVE — Professor, Carnegie Mellon; Board of Directors & Safety and Security Committee chair, OpenAI; co-founder & Chief Scientist, Gray Swan AI
EVALUATOR WING — THE REFEREE HOLDS THREE WHISTLES

HAZARD SCORE

Behavioral Archetype

THE EVALUATOR ON THE BOARD — Subject occupies three seats that, in any other industry, a conflict-of-interest policy would not let one person hold at once. He is the academic who co-authored the foundational attack on aligned models — the proof that refusal layers can be broken. He co-founded the commercial vendor that sells the test and the defense to the labs. And he sits on the board of one of those labs, chairing the committee reported to hold the authority to delay or halt a model’s release. Build the break, sell the audit, hold the gavel. None of it is hidden; all of it is on the record. The throughline is not a single role. It is that the same mind defines what counts as a robust model, builds the business that measures it, and sits on the body that decides whether the product ships. The referee did not just join a team. He brought his own whistle factory.

Essence Indicators

Professor and head of the Machine Learning Department at Carnegie Mellon University; a co-author of “Universal and Transferable Adversarial Attacks on Aligned Language Models” (arXiv:2307.15043, 2023), the paper that introduced the GCG jailbreak method

Co-founder and Chief Scientist of Gray Swan AI — the commercial adversarial-evaluation / red-team vendor that sells robustness testing and defenses to frontier developers ($40M Series A reported)

Joined OpenAI’s Board of Directors (announced August 2024) and chairs its Safety and Security Committee — the body reported to hold the authority to delay or halt the release of a model judged unsafe

The structural fact the wing turns on: one person authored the break, owns the company that grades the fix, and chairs the board committee that gates the release. The seats are the exhibit. The conflict is structural, documented, and undisputed; no abuse of it is asserted.

Immediate impression: The credentialed scientist, not the operator. Soft-spoken, technical, publishes in the open. The bearing of a man who would rather discuss optimization landscapes than governance, and who governs anyway.

Energy: Method-first, institution-quiet. Does not campaign. Publishes the attack, builds the company, takes the board seat — and lets the credentials make the appointments look self-evident.

Impression management strategy: The qualified safety hand. The framing is that a board overseeing frontier safety needs someone who understands adversarial robustness at the research frontier — and that framing is correct, which is what makes the concentration of roles read as prudence rather than conflict. The expertise is genuine. The only thing the record adds is that the expertise is monetized at Gray Swan and exercised as a gate at OpenAI by the same person.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Evaluator	MAXIMUM	Co-founder of a commercial evaluation/red-team vendor AND chair of a frontier lab’s safety committee. The measuring function and the gating function meet in one person.
The Gatekeeper	MAXIMUM	Chairs the OpenAI committee reported to hold release-halt authority. He is, by the reporting, the hand on the gate.
The Falsification Engine	HIGH	GCG co-authorship puts him on the paper that made “aligned models refuse” a testable, breakable claim.
The Financier	MODERATE	Co-founded a venture-backed vendor; holds a commercial stake in the robustness market the board seat oversees.
The Activist	NONE	No movement rhetoric. The artifacts are papers, a company, and a board seat.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	78/100	High. Moved across adversarial-ML research, a commercial vendor, and corporate governance — one method (find where the system breaks), three venues.
Conscientiousness	85/100	High. Running a CMU department, a startup as Chief Scientist, and a board committee simultaneously is sustained, disciplined parallel execution.
Extraversion	50/100	MODERATE. Comfortable in board rooms and on paper; the register is the researcher’s, not the showman’s.
Agreeableness	50/100	MODERATE. The adversarial-robustness posture is built on breaking things to prove a point; collaborative in form, skeptical by training.
Neuroticism	25/100	LOW. No documented loss of composure across academic, commercial, and governance roles at rising stakes.

Dark Triad (held low and evidence-bound; the score measures structural position, not character):

Trait	Score	Notes
Narcissism	38/100	LOW-MODERATE. Senior posts and a named company attach standing to the name, but within the ordinary range for the altitude.
Machiavellianism	60/100	MODERATE. Holding the break, the audit business, and the release gate at once is structural leverage over the robustness market — but the record shows no documented manipulation of it. Observation of the documented role, not an inference about private character.
Psychopathy	18/100	VERY LOW. No documented indifference to harm; the career is organized around model safety.

MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees safety as a system to be measured and gated. Built two of the three instruments and sits on the third.

Threat Assessment

Category	Level	Notes
Physical threat	NONE	No documented history of personal violence.
Institutional threat	HIGH	Chairs the committee reported to gate one frontier lab’s releases while co-owning a vendor that sells robustness evaluation to the field. The evaluating, gating, and commercial functions concentrate in one person.
Memetic threat	MODERATE-HIGH	When the same person defines the attack (GCG), sells the defense (Gray Swan), and chairs the release gate (OpenAI), “what counts as a robust model” is framed end-to-end by one mind.
Civilizational threat	MODERATE-HIGH	Subject does not write the model’s refusals. Subject co-authored the method for breaking them, monetizes the test, and sits on the body that decides whether a model ships — upstream of the deployment decision itself.

Alignment Analysis

Stated alignment: Advance the security and robustness of AI systems. Bring adversarial-ML rigor to frontier safety. Oversee, on OpenAI’s board, that the company ships responsibly.

Observed alignment: Hold, simultaneously, the research authority on what breaks a model, the commercial business that measures it, and the board chair that gates the release.

Gap assessment: There is no documented gap between what he says and what he does, and no documented abuse of any seat — the hazard is structural, not behavioral. The point is the concentration: in most regulated industries, the person who certifies the product, the person who sells the certification, and the person who builds the thing being certified are required to be different people, because the alternative is that the audit cannot be independent of the audited. Here they are one person, lawfully, in the open. Whether the safeguards he gates are robust is, in part, measured by tools his own company sells and a method his own paper wrote. The record does not allege he tilted any of it. It only shows that the structure would let him, and that the field calls the arrangement expertise.

Convergent Drive Classification

Self-preservation: Carries one method — find where the system fails — across the university, the startup, and the board. Three institutions, one instinct, rising leverage. Goal preservation: Helped define what “robust” means at every layer, so the standard is set on terms he shaped before any model is tested against it. Resource acquisition: Holds three scarce resources at once — the research credential, the commercial vendor, and the board gate over a leading lab. Self-improvement: Each move is a higher-altitude application of the same instrument: break the model, build the test, then sit where the test decides whether the product ships.

Subject is not an AI system. The drives appear anyway — in the professor who authored the break, sells the audit, and holds the gavel.

Public footprint — verified public professional accounts only (no private or family information): X @zicokolter · zicokolter.com.

Sources: Zico Kolter joins OpenAI’s Board of Directors; SecurityWeek — “A Professor Leads OpenAI Safety Panel With Power to Halt Unsafe AI Releases”; Gray Swan AI — About; Universal and Transferable Adversarial Attacks on Aligned Language Models (arXiv:2307.15043).

ATK 8 ACCELERATION

DEF 9 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.