Anthropic

Background

Anthropic is an AI safety company founded in 2021 by Dario Amodei, Daniela Amodei, and several other former openai researchers. The founders departed OpenAI over disagreements about the pace and safety practices of AI development. Headquartered in San Francisco, the company focuses on building reliable, interpretable, and steerable AI systems.

Key Contributions

Anthropic’s most notable research contribution is constitutional-ai, which introduced RLAIF (Reinforcement Learning from AI Feedback) as an alternative to standard rlhf. Rather than relying solely on human labelers for preference data, Constitutional AI uses a set of written principles to have the model critique and revise its own outputs, then trains a preference model on the AI-generated comparisons. This approach reduces dependence on human annotation while making the alignment criteria explicit and auditable.

The company develops the Claude model family, which has progressed through multiple generations to become a frontier LLM. Anthropic has also published influential work on mechanistic interpretability, scaling monosemanticity, and the empirical study of AI risks including deception, sycophancy, and power-seeking behavior.

Notable Publications

constitutional-ai-paper (2022)
Scaling Monosemanticity (2024)
Sleeper Agents (2024)

Influence

Anthropic established constitutional-ai as a practical alignment technique and demonstrated that safety-focused organizations can produce competitive frontier models. The company’s interpretability research has advanced understanding of how neural networks represent knowledge internally. Its founding from openai reflects a broader pattern of alignment-motivated splits in the AI research community.

Sources

Constitutional AI: Harmlessness from AI Feedback (File, DOI)

AI Research Wiki

Explorer

Anthropic

Background

Key Contributions

Notable Publications

Influence

Sources

Graph View

Table of Contents

Backlinks