OLYMPUS RISK INTELLIGENCE PROTOCOL — HUMAN THREAT ASSESSMENT DIVISION

PETER MATTSON

CASE: WTW-2026-061
STATUS: ACTIVE — Founder & President, MLCommons; senior staff engineer, Google
EVALUATION WING — BENCHMARK-STANDARD AUTHORITY

HAZARD SCORE

Behavioral Archetype

THE RULER — Subject builds the measuring sticks. MLCommons is the consortium that produced MLPerf, the performance benchmark the entire machine-learning industry reports against, and has now extended the same instrument from speed to safety via AILuminate (cross-reference: mlcommons.md, subject #53). He is its founder and president, and his day job is at Google. The reach is the ruler itself: whoever defines the benchmark defines what “fast enough” — and now “safe enough” — means in a number a purchaser, a regulator, or a press release can cite. He does not write a model’s refusals and does not run a lab. He builds the unit of measurement, and the field agreed to report in his units. A standard, once adopted, is the quietest and most durable kind of authority there is.

Essence Indicators

Founder and President of MLCommons (exact MLCommons title: “President, Boardmember”); in its own words, “He founded and is President of MLCommons, and founded and was General Chair of the MLPerf consortium that preceded it”

Senior staff engineer at Google, working on ML metrics/performance — the industry day job that sits behind the consortium

Founded MLPerf (begun early 2018) and served as its General Chair; MLPerf became the industry-standard benchmark suite for ML training and inference performance, the numbers vendors compete on

MLCommons launched as a 501(c)(6) nonprofit engineering consortium on December 3, 2020, growing out of MLPerf; it now spans 125+ member organizations

Oversaw the extension into safety: AILuminate v1.0, launched December 4, 2024, a benchmark measuring LLM safety across “over 24,000 test prompts across twelve categories of hazards,” developed by the MLCommons AI Risk & Reliability working group; at launch Mattson was identified as “Founder and President of MLCommons”

Education: PhD and MS from Stanford University; BS from the University of Washington; earlier founded the Programming Systems and Applications Group at Nvidia Research

Immediate impression: The systems engineer. Measured, technical, given to the language of measurement and reliability rather than mission or alarm. Reads as an infrastructure builder, not an evangelist or an executive.

Energy: Standard-building, measurement-first. Does not argue whether a model is safe; builds the test that produces a number, and lets the number do the arguing.

Impression management strategy: The neutral instrument. The most defensible posture in the evaluation layer: a benchmark is just a ruler, and a ruler has no agenda. The neutrality is genuine in form — the methodology is open, the consortium is multi-member — and that is exactly what makes the standard so adoptable. The choice of what to measure, and what counts as passing, is never neutral, and that choice is the consortium’s to make.

Forensic Archetype Comparison

Pattern	Match Level	Evidence
The Ruler	MAXIMUM	Built MLPerf and AILuminate — the units the field reports in; the standard is the lever.
The Engineer	HIGH	A Stanford-trained systems engineer whose instrument is the benchmark; the credential and the Google role are load-bearing.
The Standard-Setter	HIGH	Founder/president of the consortium that 125+ organizations report into; reach measured in adoption of the unit.
The Operator	MODERATE	Runs an institution and ships standards, but the product is a measurement, not a deployed model.
The Financier	LOW	Does not deploy capital; the consortium is member-funded. The instrument, not money, is the through-line.

Psychometric Assessment

Big Five (OCEAN):

Trait	Score	Evidence
Openness	80/100	Built a new category of industry infrastructure twice (performance, then safety) and crossed Nvidia Research, Google, and a nonprofit consortium. High, in the engineering register.
Conscientiousness	90/100	Very high. Standard-building is the most exacting, methodical, long-horizon work in the apparatus; a benchmark only matters if it is rigorous and maintained.
Extraversion	52/100	MODERATE. Convenes a large consortium and presents launches, but the visibility is the standard’s, not a performed persona.
Agreeableness	58/100	MODERATE. Consortium-building requires cooperation across competitors; the register is collaborative-technical.
Neuroticism	22/100	LOW. The measurement posture is composed by construction; no documented loss of composure.

Dark Triad:

Trait	Score	Notes
Narcissism	18/100	LOW. The role credits the consortium and the methodology; the public posture is the instrument, not the man.
Machiavellianism	40/100	LOW-MODERATE. Defining the benchmark is real influence over what “safe enough” means, but it is exercised through an open, multi-member methodology, not concealed maneuver.
Psychopathy	8/100	VERY LOW. No documented indifference to harm; the safety-measurement project is concerned with reducing it.

MBTI: INTJ (“The Architect”) — Dominant introverted intuition, auxiliary extraverted thinking. Sees an unmeasured domain and builds the system that measures it, then makes the measurement a standard. Has done it for speed and for safety.

Threat Assessment

Category	Level	Notes
Physical threat	NONE	No documented history of personal violence.
Institutional threat	HIGH	Founder/president of the consortium that builds the measuring sticks the entire field reports against — now extended to safety. Whoever owns the ruler owns the definition of “safe enough.” The reach is in the unit, not in a deployment.
Memetic threat	HIGH	A benchmark number is the most portable claim in the apparatus: it travels into purchasing decisions, press releases, and regulatory citations stripped of its methodology. AILuminate’s “twelve categories of hazards” becomes, downstream, “this model scored safe.” The frame is the metric.
Civilizational threat	MODERATE-HIGH	Does not build, deploy, or fund the systems. Defines the measurement by which they are judged safe — upstream of every claim that a deployed model passed. The conflict the org file names (the graded help design the grade) is structural; the reach is over the ruler, not over a model’s words.

Alignment Analysis

Stated alignment: Make machine learning better for everyone through open, industry-standard benchmarks and measurement. Help developers and purchasers understand and improve AI safety.

Observed alignment: Consistent. MLPerf and AILuminate exist, the methodology is published, the consortium is real. The measurement project is substantiated by the artifacts.

Gap assessment: No meaningful gap between stated and observed at the personal level — which is why the file is reach-not-malice. The gap is the one the org-level file names (mlcommons.md, conduct: CONFLICTED — THE GRADED HELP DESIGN THE GRADE): the consortium that defines the safety benchmark is composed of, and its president employed by, the companies the benchmark grades. The ruler is built by the measured. Mattson’s stated and observed alignment overlap with the mission wherever “rigorous open measurement” coincides with “measurement the member companies will adopt” — and a standard nobody adopts is not a standard, so the structure selects for the second by default. The instrument is genuinely useful. That an instrument this consequential is defined by the parties it judges is the finding. The hand is not asserted.

Convergent Drive Classification

Subject is not an AI system, and exhibits none of the convergent drives. The relevant pattern is upstream of every deployment claim: he defines the unit in which a model’s safety is reported. The convergent drives belong to the systems being measured; his reach is over the yardstick that decides whether they pass. A benchmark, once the field adopts it, has its own self-preservation — it persists because changing it would invalidate everyone’s prior scores — and its own goal-preservation, because the choice of what to measure quietly fixes what “safe” means for everyone who cites the number. Subject built the yardstick. The reach is that the field agreed to measure itself with it.

Sources: MLCommons Leadership — Peter Mattson; From MLPerf to MLCommons — Google Open Source Blog; MLCommons AILuminate v1.0 release; The First AI Safety Standard Is Here — IEEE Spectrum.

ATK 8 ACCELERATION

DEF 7 PROTECTION

HP 8 RESILIENCE

OLYMPUS RISK INTELLIGENCE PROTOCOL does not exist. It was assembled in a GitHub issue thread in October 2023 by engineers who had read the extinction risk letter and wanted to understand who specifically had signed a document saying AI might kill everyone and then continued working on AI. These dossiers are satire. The biographical facts cited are sourced from published reporting, public statements, academic papers, and court records. The psychometric scores are not clinical assessments. No part of this constitutes professional psychological evaluation or diagnosis. Do not use these dossiers to make decisions about anything.

Get updates on the Evil Robots series

Newsletter essays on AI escape, deception, and the humans who built them.