An Introduction to AI Interpretability and the Inner Workings of Gemma 2 2B
๐ Hello!
The inner workings of modern AIs are a mystery. This is because AIs are language models that are grown, not designed.
The science of understanding what happens inside AI is called interpretability.
This demo is a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic.
HOVER TIPS
CLICKABLE TIPS
โ
Caveats and Warnings
๐งโ๐ฌ
Advanced Technical Details
๐ง Get Started
โ
๐งโ๐ฌ
Start Here
๐ฌ
Microscope
Scan Gemma 2's brain to see what it's thinking.
โก๏ธ
Analyze Features
Make features fire and figure out what they do.
๐น๏ธ
Steer Gemma
Change Gemma's behavior by manipulating features.
๐
Do More
Dive deeper into the exciting world of AI interpretability.