How difficult is AI alignment? | Anthropic Research Salon

Anthropic

28:06 8 January 2025

22,752 Views 809 0

At an Anthropic Research Salon event in San Francisco, four of our researchers—Alex Tamkin, Jan Leike, Amanda Askell and Josh Batson—discussed alignment science, interpretability, and the future of AI research.

Further reading:
Anthropic’s research: https://anthropic.com/research
Claude’s character: https://www.anthropic.com/news/claude-character
Evaluating feature steering: https://www.anthropic.com/research/evaluating-feature-steering

0:00 Introduction
0:30 An overview of alignment
4:48 Challenges of scaling
8:08 Role of interpretability
12:02 How models can help
14:31 Signs of whether alignment is easy or hard
18:28 Q&A — Multi-agent deliberation
20:38 Q&A — Model alignment epiphenomenon
23:43 Q&A — What solving alignment could look like

How difficult is AI alignment? | Anthropic Research Salon

Anthropic

Related Videos

Building Anthropic | A conversation with our co-founders

Alignment faking in large language models

What do people use AI models for?