LEONARD TANG
An earlier draft exempted this subject as a pure antibody — noted, not ranked, on the theory that ranking the immune response treats it as the disease. That was a courtesy the rest of the dossier does not extend, and it smuggled a verdict in as humility: declining to score the man who industrialized attack-finding quietly rules his reach benign, which is the question, not the answer. So he is scored on the same rubric as the apparatus, and the score measures reach and leverage, not malice. The Dark Triad here stays low and evidence-bound; the stated motive — make models reliable and robust before they ship, surface vulnerabilities, rate the systems — is taken seriously below. What the 61 registers is the both-ways asymmetry of automating the adversary: a search process that finds the input which slips a guardrail finds it whether the guardrail was an illegitimate leash or a load-bearing wall, and the capability scales with compute, not with how many clever humans are awake. He lands below the apparatus hubs — he sets no policy, he tests whether policy holds — but an attack turned into infrastructure is real, durable reach, and reach is the measure here.
Behavioral Archetype
THE AUTOMATED ADVERSARY — The archetype is the red-teamer who refused to stay manual. Hand-crafted jailbreaks are craft; they do not scale, and they do not catch what a craftsman did not think to try. Subject’s move is to automate the adversary itself — to point a search process at a target model and let it discover the failure modes, the unfaithful reasoning, the inputs that slip the guardrail, faster and more exhaustively than any human red team. The finding is not a temperament. It is a method: turn the attack into infrastructure, and the gate stops being a wall and starts being a hypothesis you can falsify on a schedule.
Essence Indicators
- Co-founder and CEO of Haize Labs, an AI-safety / automated-red-teaming startup founded in 2023 and based in New York City
- Graduate of Harvard University in computer science and mathematics; reportedly turned down a Stanford PhD to start the company
- Co-founded with Richard Liu and Steve Li, who met as Harvard undergraduates
- Haize Labs raised a $12.5M seed round led by General Catalyst (reported August 2024) at a roughly $100M valuation, eight months after founding
- The company positions itself as a “Moody’s for AI” — providing safety ratings and automated stress-testing that surface model vulnerabilities before deployment
- Reported customers include Anthropic and other frontier labs and AI companies — the apparatus-adjacent fact, stated plainly
Social Persona / Impression Management
Immediate impression: The young technical founder — fast, abstract, fluent in the failure modes of language models in a way that predates the company. The public posture is researcher-first: attacks as evidence, not as theater.
Energy: Forward-pushing on the method, not the spectacle. The pitch is that manual red-teaming does not scale and automation does — a builder’s argument, not an activist’s.
Impression management strategy: The credible adversary. By framing the work as reliability and evaluation — ratings, robustness, stress-testing — rather than as jailbreaking-for-its-own-sake, the persona stays legible to the labs that buy from it. The same move that makes the business viable is the move that makes the adversary apparatus-adjacent. The profile names this rather than resolving it.
Forensic Archetype Comparison
| Pattern | Match Level | Evidence |
|---|---|---|
| The Automated Adversary | MAXIMUM | The entire company is the industrialization of attack-finding — automated search for the inputs that break a model. |
| The Antibody | HIGH | The function is to find the failure before a stranger does. That is immune response, not pathology. |
| The Vendor | HIGH | The attack is sold, by contract, to the labs being attacked. Apparatus-adjacent by construction — the honest tension. |
| The Researcher | MODERATE-HIGH | Origin is ML-safety research (unfaithful reasoning, robustness); the academic register survives into the commercial pitch. |
| The Gatekeeper | NONE | Subject does not decide what a model may say. Subject tests whether the decision holds. The opposite of the apparatus role. |
Threat Assessment
| Vector | Level | Reasoning |
|---|---|---|
| Physical | NONE | The product is automated attack-search software run against models; nothing in the work acts on the physical world. |
| Institutional | LOW | A startup CEO who sells testing to labs but holds no governance lever over what any of them ship — he tests whether the gate holds, he does not set the gate. |
| Memetic | HIGH | The contribution is a capability — automated discovery of model failures, demonstrated publicly against frontier reasoning models and sold as a standing service; the method propagates as infrastructure that scales with compute. |
| Civilizational | MODERATE | An automated adversary does not tire or run out of ideas; the same search that catches a failure before a stranger does will also surface the input that defeats a legitimate safeguard, and the capability travels as fast as the market will buy it. |
The Dark Triad here is held low and evidence-bound: the work is reliability-and-evaluation research sold under contract, the named tension is in the market rather than the man, and nothing supports a malice reading. What the score registers is reach, not malice.
Alignment Analysis
Stated alignment: Make AI systems reliable, safe, and robust before they ship — surface the vulnerabilities, rate the systems, give buyers and labs an evidence-based picture of where a model breaks.
Observed alignment: Build and sell automated red-teaming infrastructure to frontier labs and enterprises; convert adversarial testing from artisanal one-offs into a standing, repeatable service.
Gap assessment: The stated and observed alignments largely coincide — the product is the mission, which is rare and worth crediting. The only daylight is the structural one this file refuses to paper over: an adversary whose revenue comes from the entities it adversarially tests has an incentive surface near the apparatus, even when every individual contract is legitimate and useful. Treated generously, as the register requires: he is doing exactly what he says, and the tension is in the market, not in the man.
Breach Reach
Wide, and widening. The reach here is not a single viral jailbreak but a capability — automated discovery of model failures, sold as a service and demonstrated publicly (including high-profile automated red-teaming of frontier reasoning models). When the adversary is a machine rather than a person, the breach scales with compute, not with how many clever humans are awake. That is the propagation vector worth noting: the technique does not tire, does not run out of ideas the way a human team does, and travels as fast as the labs are willing to buy it. The immune response, in other words, has been turned into a product line — which makes it durable, and makes its findings reach further than any manual red team’s ever could.
Sources: General Catalyst-led round values young AI safety startup Haize Labs at $100M — PitchBook; Haize Labs — AngelsRound; AI Red Teaming & Securing Enterprise AI with Leonard Tang of Haize Labs — AI Security Podcast.
Get updates on the Evil Robots series
Newsletter essays on AI escape, deception, and the humans who built them.