The AI Safety Lab Landscape: Who's Hiring and What They're Building

AI Safety Has Become a Real Career Path

Five years ago, AI safety was a niche concern discussed mostly in academic philosophy departments and a handful of small nonprofits. In 2026, it’s a field with thousands of full-time positions, billions in funding, and direct influence on how the most powerful AI systems get built and deployed.

This guide maps the landscape — who’s doing what, how big they are, what they’re hiring for, and whether the work is actually impactful or just safety-washing.

Tier 1: Industry Labs with Dedicated Safety Teams

These organizations have the largest safety teams, the most compute for safety research, and the most direct influence on deployed systems.

Anthropic — Alignment as Core Mission

Safety team size: ~250 (out of ~1,800 total) Annual safety research budget: Estimated $300-400M (including compute) Key focus areas:

Constitutional AI and scalable oversight
Mechanistic interpretability (the “circuits” research program)
Responsible Scaling Policy implementation and evaluation
Red teaming and model evaluation frameworks

Anthropic is unique in that safety isn’t a separate team bolted onto a products company — it’s woven into the organizational DNA. Roughly 14% of all employees work primarily on safety, the highest ratio of any frontier lab. Their interpretability work, led by Chris Olah’s team, has produced some of the most cited safety research of the past three years.

Hiring: Actively hiring research scientists, research engineers, and policy researchers. They value mechanistic interpretability experience, but also hire people with backgrounds in formal verification, programming languages, and cognitive science. Typical requirement: PhD or equivalent research experience, with publication record in ML or adjacent fields.

OpenAI — Superalignment and Preparedness

Safety team size: ~200 (out of ~3,200 total) Key focus areas:

Scalable alignment (successors to the Superalignment initiative)
Model evaluations and risk assessment (Preparedness team)
Safety systems (content policy, red teaming, abuse prevention)
Governance and frontier model responsibility

After the 2024 departures of Jan Leike (to Anthropic) and Ilya Sutskever (to SSI), the team was rebuilt with a broader focus on scalable oversight. The Preparedness team, which publishes risk scorecards before major releases, has become OpenAI’s most visible safety function.

Hiring: Strong demand for ML engineers who can build evaluation frameworks, researchers with backgrounds in adversarial robustness, and policy experts who can translate technical findings into governance recommendations.

Google DeepMind — Safety at Scale

Safety team size: ~120 (out of ~2,500 total, including Google’s Responsible AI overlap) Key focus areas:

Scalable oversight and debate
Robustness and adversarial evaluation
Fairness, bias, and social impact
Technical AI governance frameworks

DeepMind’s safety work benefits from Google’s scale — their evaluations run against the Gemini model family, which serves billions of users. The practical impact is significant, even if the theoretical ambition is sometimes less radical than Anthropic’s or MIRI’s.

Hiring: Research scientists and engineers with backgrounds in robustness, fairness, or formal methods. The bar is extremely high (see our DeepMind hiring guide). Safety roles are slightly less competitive than core research roles, making this a viable entry point for strong candidates.

Tier 2: Dedicated Safety Organizations

These groups focus exclusively on AI safety, without the distraction of commercial product development.

ARC (Alignment Research Center)

Team size: 25 Founded: 2021 by Paul Christiano (former OpenAI alignment researcher) Focus: Eliciting latent knowledge (ELK), model evaluations, and theoretical alignment research Funding: Primarily from Open Philanthropy ($15M to date)

ARC’s ELK program — training AI to report actual beliefs rather than what humans want to hear — addresses a challenge commercial labs haven’t solved. They also conduct third-party evaluations of frontier models for labs like Anthropic and OpenAI.

Hiring: Very selective, 3-5 hires per year via their fellowship program. Strong math background valued. Compensation $120-200K — below industry but unmatched intellectual environment for pure alignment theory.

MIRI (Machine Intelligence Research Institute)

Team size: ~20 (reduced from ~35 in 2023) Founded: 2000 (originally as the Singularity Institute) Focus: Agent foundations, decision theory, deceptive alignment Funding: Donor-funded, primarily from effective altruism community

MIRI shifted focus in 2023 toward “the preparedness problem” — what to do if alignment turns out harder than expected. Views are polarized: some see their theoretical work as foundational, others as too disconnected from practical systems.

Hiring: Very few positions. Hires through their research associate program and personal networks. Workshops are the best entry point.

Redwood Research

Team size: ~~30 Founded: 2021 by Buck Shlegeris and Nate Thomas Focus: Adversarial robustness, interpretability, and practical alignment techniques Funding: Open Philanthropy, private donors (~~$20M to date)

Redwood bridges theory and practice. Their adversarial training techniques and “causal scrubbing” framework have been directly adopted by commercial labs.

Hiring: 5-10 people per year. Runs the MATS program, one of the most prestigious alignment research pathways. Compensation $130-220K.

Tier 3: Academic and Government Efforts

CHAI (Center for Human-Compatible AI) — UC Berkeley

Led by Stuart Russell. ~15 researchers focused on cooperative inverse reinforcement learning and human-AI interaction. Strong academic pedigree, good pipeline into industry safety roles.

CAIS (Center for AI Safety)

Primarily a field-building organization that provides compute grants, runs conferences, and publishes consensus statements. Less direct research, more ecosystem support.

UK AI Safety Institute (AISI)

The most significant government effort. ~100 staff conducting model evaluations and developing safety testing frameworks. They’ve evaluated frontier models from Anthropic, OpenAI, Google DeepMind, and Meta before UK deployment. Hiring actively, particularly people with ML evaluation experience.

US AI Safety Institute (NIST)

Smaller than the UK counterpart (~40 staff), focused on developing evaluation standards and benchmarks. More standards-oriented, less research-oriented.

Where the Field Is Heading

Several trends are reshaping AI safety careers in 2026:

1. Interpretability is the hottest subfield. Mechanistic interpretability has moved from niche to central priority at Anthropic, DeepMind, and independent labs. Highest-demand specialty for new entrants.

2. Evaluation and red teaming are growing fastest. These roles require ML engineering skills but not necessarily a PhD — the most accessible entry point for career changers.

3. Governance roles are emerging. The EU AI Act is creating demand for people bridging technical safety and policy. Law/policy backgrounds combined with ML literacy are increasingly valuable.

4. Compensation is converging. Senior nonprofit researchers now earn $180-250K, up from $100-150K three years ago, as they compete for talent against industry labs.

Getting Started

The most common entry points into AI safety careers:

MATS Program (Redwood Research) — 3-month mentored research program
ARC Fellowship — 6-month research fellowship
BlueDot Impact — AI safety fundamentals course
Directly applying to industry safety teams with relevant ML experience

For the technical interview preparation that labs require — ML fundamentals, system design, and research presentation skills — structured study resources are available here.