An Introduction to AI Interpretability and the Inner Workings of Gemma 2 2B
👋 Hello!
The inner workings of modern AIs are a mystery. This is because AIs are language models that are grown, not designed.
The science of understanding what happens inside AI is called interpretability.
This demo is a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic.
HOVER TIPS
CLICKABLE TIPS
❕
Caveats and Warnings
🧑🔬
Advanced Technical Details
🔧 Get Started
❕
🧑🔬
Start Here
🔬
Microscope
Scan Gemma 2's brain to see what it's thinking.
⚡️
Analyze Features
Make features fire and figure out what they do.
🕹️
Steer Gemma
Change Gemma's behavior by manipulating features.
🚀
Do More
Dive deeper into the exciting world of AI interpretability.