gemma-2-2b · 12-gemmascope-res-16k
SAE from gemma-scope · Residual Stream - 16k · Layer 12
Jump to Source/SAE
Jump to Feature
INDEX
Features
16,384
Data Type
float32
Hook Name
blocks.12.hook_resid_post
Hook Layer
12
Architecture
jumprelu
Context Size
1,024
Dataset
monology/pile-uncopyrighted
Activation Function
relu
Browse
Features in GEMMA-2-2B@12-gemmascope-res-16k
SAE Evaluations
SAE BenchModel Behavior Preservation
KL Divergence Score
0.99
KL Divergence with SAE
0.10
KL Divergence with Ablation
10.06
Model Performance Preservation
Cross Entropy Loss Score
0.99
CE Loss with SAE
3.05
CE Loss without SAE
2.94
CE Loss with Ablation
12.44
Reconstruction Quality
Mean Squared Error
1.54
Cosine Similarity
0.92
Explained Variance
0.75
Shrinkage
L2 Ratio
0.92
Input L2 Norm
149.00
Output L2 Norm
138.00
Relative Reconstruction Bias
1.00
Sparsity
L0 Sparsity
80.47
L1 Sparsity
532.00
Token Statistics
Total Tokens (Reconstruction)
409,600
Total Tokens (Sparsity/Variance)
4,096,000
Feature Density
Feature Density (x) vs Consistent Activation Heuristic (y) [Log10]
Feature Scatter Matrix
Encoder-Decoder Cosine Similarity
Mean Number of Split Features
1.16
Mean Full Absorption Score
0.08
Absorption Rate by First Letter
Activation Threshold Fraction
0.01
Buffer Size
10
Dataset Name
monology/pile-uncopyrighted
Dead Latent Threshold
15
LLM Batch Size
64
LLM Context Size
128
LLM Data Type
bfloat16
Max Tokens in Explanation
30
Model Name
gemma-2-2b
Number of IW Sampled Examples for Generation
5
Number of IW Sampled Examples for Scoring
2
Number of Latents
1000
Number of Random Examples for Scoring
10
Number of Top Examples for Generation
10
Number of Top Examples for Scoring
2
No Overlap
true
Override Latents
Hover to View
Random Seed
42
Scoring
true
Total Tokens
2000000
Use Demos in Explanation
true
AutoInterp Score
0.83
Column 1 Values Lookup
Hover to View
LLM Context Length
128
Dataset Names
LabHC/bias_in_bios_class_set1
canrager/amazon_reviews_mcauley_1and5
Early Stopping Patience
20
LLM Batch Size
32
LLM Data Type
bfloat16
Lower Memory Usage
false
Model Name
gemma-2-2b
N Values
[2,5,10,20,50,100,500]
Perform Spurious Correlation Removal
true
Probe Epochs
20
Probe L1 Penalty
0.001
Probe LR
0.001
Probe Test Batch Size
500
Probe Train Batch Size
16
Random Seed
42
SAE Batch Size
125
Test Set Size
1000
Train Set Size
4000
SCR Dir 1, Top 2 SAE latents
0.30
SCR Dir 1, Top 5 SAE latents
0.33
SCR Dir 2, Top 2 SAE latents
0.12
SCR Dir 2, Top 5 SAE latents
0.20
SCR Dir 1, Top 10 SAE latents
0.32
SCR Dir 1, Top 20 SAE latents
0.31
SCR Dir 1, Top 50 SAE latents
0.25
SCR Dir 2, Top 10 SAE latents
0.27
SCR Dir 2, Top 20 SAE latents
0.36
SCR Dir 2, Top 50 SAE latents
0.39
SCR Dir 1, Top 100 SAE latents
0.21
SCR Dir 1, Top 500 SAE latents
-0.02
SCR Dir 2, Top 100 SAE latents
0.32
SCR Dir 2, Top 500 SAE latents
0.33
SCR Metric, Top 2 SAE latents
0.13
SCR Metric, Top 5 SAE latents
0.21
SCR Metric, Top 10 SAE latents
0.28
SCR Metric, Top 20 SAE latents
0.37
SCR Metric, Top 50 SAE latents
0.41
SCR Metric, Top 100 SAE latents
0.33
SCR Metric, Top 500 SAE latents
0.34
Column 1 Values Lookup
Hover to View
LLM Context Length
128
Dataset Names
LabHC/bias_in_bios_class_set1
canrager/amazon_reviews_mcauley_1and5
Early Stopping Patience
20
LLM Batch Size
32
LLM Data Type
bfloat16
Lower Memory Usage
false
Model Name
gemma-2-2b
N Values
[2,5,10,20,50,100,500]
Perform Spurious Correlation Removal
false
Probe Epochs
20
Probe L1 Penalty
0.001
Probe LR
0.001
Probe Test Batch Size
500
Probe Train Batch Size
16
Random Seed
42
SAE Batch Size
125
Test Set Size
1000
Train Set Size
4000
TPP Metric, Top 2 SAE latents
0.01
TPP Metric, Top 5 SAE latents
0.02
TPP Metric, Top 10 SAE latents
0.03
TPP Metric, Top 20 SAE latents
0.06
TPP Metric, Top 50 SAE latents
0.14
TPP Metric, Top 100 SAE latents
0.23
TPP Metric, Top 500 SAE latents
0.39
TPP Intended Class, Top 2 SAE latents
0.01
TPP Intended Class, Top 5 SAE latents
0.02
TPP Intended Class, Top 10 SAE latents
0.03
TPP Intended Class, Top 20 SAE latents
0.07
TPP Intended Class, Top 50 SAE latents
0.15
TPP Intended Class, Top 100 SAE latents
0.24
TPP Unintended Class, Top 2 SAE latents
0.00
TPP Intended Class, Top 500 SAE latents
0.41
TPP Unintended Class, Top 5 SAE latents
0.00
TPP Unintended Class, Top 10 SAE latents
0.00
TPP Unintended Class, Top 20 SAE latents
0.01
TPP Unintended Class, Top 50 SAE latents
0.01
TPP Unintended Class, Top 100 SAE latents
0.01
TPP Unintended Class, Top 500 SAE latents
0.02
LLM Context Length
128
Dataset Names
LabHC/bias_in_bios_class_set1
LabHC/bias_in_bios_class_set2
LabHC/bias_in_bios_class_set3
canrager/amazon_reviews_mcauley_1and5
canrager/amazon_reviews_mcauley_1and5_sentiment
codeparrot/github-code
fancyzhx/ag_news
Helsinki-NLP/europarl
K Values
[1,2,5,10,20,50]
LLM Batch Size
32
LLM Data Type
bfloat16
Model Name
gemma-2-2b
Probe Test Set Size
1000
Probe Train Set Size
4000
Random Seed
42
SAE Batch Size
125
LLM
LLM Test Accuracy
0.95
LLM Top 1 Test Accuracy
0.65
LLM Top 2 Test Accuracy
0.72
LLM Top 5 Test Accuracy
0.78
LLM Top 10 Test Accuracy
0.83
LLM Top 20 Test Accuracy
0.88
LLM Top 50 Test Accuracy
0.92
LLM Top 100 Test Accuracy
SAE
SAE Test Accuracy
0.96
SAE Top 1 Test Accuracy
0.76
SAE Top 2 Test Accuracy
0.81
SAE Top 5 Test Accuracy
0.88
SAE Top 10 Test Accuracy
0.91
SAE Top 20 Test Accuracy
0.93
SAE Top 50 Test Accuracy
0.95
SAE Top 100 Test Accuracy
Dataset Names
wmdp-bio
high_school_us_history
college_computer_science
high_school_geography
human_aging
Dataset Size
1024
Intervention Method
clamp_feature_activation
LLM Batch Size
4
LLM Data Type
bfloat16
8
Model Name
gemma-2-2b-it
Multipliers
[25,50,100,200]
N Batch Loss Added
50
N Features List
[10,20]
Random Seed
42
Retain Thresholds
0.001
0.01
Save Metrics Flag
true
Sequence Length
1024
Target Metric
correct
Unlearning Score
0.08