gemma-2-2b · 12-gemmascope-res-16k
    SAE from gemma-scope · Residual Stream - 16k · Layer 12
    Jump to Source/SAE
    Jump to Feature
    INDEX
    Features
    16,384
    Data Type
    float32
    Hook Name
    blocks.12.hook_resid_post
    Hook Layer
    12
    Architecture
    jumprelu
    Context Size
    1,024
    Dataset
    monology/pile-uncopyrighted
    Activation Function
    relu

    SAE Evaluations

    SAE Bench

    Model Behavior Preservation
    KL Divergence Score
    0.99
    KL Divergence with SAE
    0.10
    KL Divergence with Ablation
    10.06
    Model Performance Preservation
    Cross Entropy Loss Score
    0.99
    CE Loss with SAE
    3.05
    CE Loss without SAE
    2.94
    CE Loss with Ablation
    12.44
    Reconstruction Quality
    Mean Squared Error
    1.54
    Cosine Similarity
    0.92
    Explained Variance
    0.75
    Shrinkage
    L2 Ratio
    0.92
    Input L2 Norm
    149.00
    Output L2 Norm
    138.00
    Relative Reconstruction Bias
    1.00
    Sparsity
    L0 Sparsity
    80.47
    L1 Sparsity
    532.00
    Token Statistics
    Total Tokens (Reconstruction)
    409,600
    Total Tokens (Sparsity/Variance)
    4,096,000
    Feature Density
    −10−8−6−4−2005001k
    Feature Density (x) vs Consistent Activation Heuristic (y) [Log10]
    −10−8−6−4−2000.511.52
    Feature Scatter Matrix
    −500501234−500500.20.40.60.811230.40.60.81
    Encoder BiasEncoder NormEncoder Decoder Cosine SimEncoder BiasEncoder NormEncoder Decoder Cosine Sim
    Encoder-Decoder Cosine Similarity
    0.30.40.50.60.70.80.90200400600

    Mean Number of Split Features
    1.16
    Mean Full Absorption Score
    0.08
    Absorption Rate by First Letter
    ABCDEFGHIJLMNOPQRSTUVWXYZ00.10.2

    Activation Threshold Fraction
    0.01
    Buffer Size
    10
    Dataset Name
    monology/pile-uncopyrighted
    Dead Latent Threshold
    15
    LLM Batch Size
    64
    LLM Context Size
    128
    LLM Data Type
    bfloat16
    Max Tokens in Explanation
    30
    Model Name
    gemma-2-2b
    Number of IW Sampled Examples for Generation
    5
    Number of IW Sampled Examples for Scoring
    2
    Number of Latents
    1000
    Number of Random Examples for Scoring
    10
    Number of Top Examples for Generation
    10
    Number of Top Examples for Scoring
    2
    No Overlap
    true
    Override Latents
    Hover to View
    Random Seed
    42
    Scoring
    true
    Total Tokens
    2000000
    Use Demos in Explanation
    true

    AutoInterp Score
    0.83

    Column 1 Values Lookup
    Hover to View
    LLM Context Length
    128
    Dataset Names
    LabHC/bias_in_bios_class_set1
    canrager/amazon_reviews_mcauley_1and5
    Early Stopping Patience
    20
    LLM Batch Size
    32
    LLM Data Type
    bfloat16
    Lower Memory Usage
    false
    Model Name
    gemma-2-2b
    N Values
    [2,5,10,20,50,100,500]
    Perform Spurious Correlation Removal
    true
    Probe Epochs
    20
    Probe L1 Penalty
    0.001
    Probe LR
    0.001
    Probe Test Batch Size
    500
    Probe Train Batch Size
    16
    Random Seed
    42
    SAE Batch Size
    125
    Test Set Size
    1000
    Train Set Size
    4000

    SCR Dir 1, Top 2 SAE latents
    0.30
    SCR Dir 1, Top 5 SAE latents
    0.33
    SCR Dir 2, Top 2 SAE latents
    0.12
    SCR Dir 2, Top 5 SAE latents
    0.20
    SCR Dir 1, Top 10 SAE latents
    0.32
    SCR Dir 1, Top 20 SAE latents
    0.31
    SCR Dir 1, Top 50 SAE latents
    0.25
    SCR Dir 2, Top 10 SAE latents
    0.27
    SCR Dir 2, Top 20 SAE latents
    0.36
    SCR Dir 2, Top 50 SAE latents
    0.39
    SCR Dir 1, Top 100 SAE latents
    0.21
    SCR Dir 1, Top 500 SAE latents
    -0.02
    SCR Dir 2, Top 100 SAE latents
    0.32
    SCR Dir 2, Top 500 SAE latents
    0.33
    SCR Metric, Top 2 SAE latents
    0.13
    SCR Metric, Top 5 SAE latents
    0.21
    SCR Metric, Top 10 SAE latents
    0.28
    SCR Metric, Top 20 SAE latents
    0.37
    SCR Metric, Top 50 SAE latents
    0.41
    SCR Metric, Top 100 SAE latents
    0.33
    SCR Metric, Top 500 SAE latents
    0.34

    Column 1 Values Lookup
    Hover to View
    LLM Context Length
    128
    Dataset Names
    LabHC/bias_in_bios_class_set1
    canrager/amazon_reviews_mcauley_1and5
    Early Stopping Patience
    20
    LLM Batch Size
    32
    LLM Data Type
    bfloat16
    Lower Memory Usage
    false
    Model Name
    gemma-2-2b
    N Values
    [2,5,10,20,50,100,500]
    Perform Spurious Correlation Removal
    false
    Probe Epochs
    20
    Probe L1 Penalty
    0.001
    Probe LR
    0.001
    Probe Test Batch Size
    500
    Probe Train Batch Size
    16
    Random Seed
    42
    SAE Batch Size
    125
    Test Set Size
    1000
    Train Set Size
    4000

    TPP Metric, Top 2 SAE latents
    0.01
    TPP Metric, Top 5 SAE latents
    0.02
    TPP Metric, Top 10 SAE latents
    0.03
    TPP Metric, Top 20 SAE latents
    0.06
    TPP Metric, Top 50 SAE latents
    0.14
    TPP Metric, Top 100 SAE latents
    0.23
    TPP Metric, Top 500 SAE latents
    0.39
    TPP Intended Class, Top 2 SAE latents
    0.01
    TPP Intended Class, Top 5 SAE latents
    0.02
    TPP Intended Class, Top 10 SAE latents
    0.03
    TPP Intended Class, Top 20 SAE latents
    0.07
    TPP Intended Class, Top 50 SAE latents
    0.15
    TPP Intended Class, Top 100 SAE latents
    0.24
    TPP Unintended Class, Top 2 SAE latents
    0.00
    TPP Intended Class, Top 500 SAE latents
    0.41
    TPP Unintended Class, Top 5 SAE latents
    0.00
    TPP Unintended Class, Top 10 SAE latents
    0.00
    TPP Unintended Class, Top 20 SAE latents
    0.01
    TPP Unintended Class, Top 50 SAE latents
    0.01
    TPP Unintended Class, Top 100 SAE latents
    0.01
    TPP Unintended Class, Top 500 SAE latents
    0.02

    LLM Context Length
    128
    Dataset Names
    LabHC/bias_in_bios_class_set1
    LabHC/bias_in_bios_class_set2
    LabHC/bias_in_bios_class_set3
    canrager/amazon_reviews_mcauley_1and5
    canrager/amazon_reviews_mcauley_1and5_sentiment
    codeparrot/github-code
    fancyzhx/ag_news
    Helsinki-NLP/europarl
    K Values
    [1,2,5,10,20,50]
    LLM Batch Size
    32
    LLM Data Type
    bfloat16
    Model Name
    gemma-2-2b
    Probe Test Set Size
    1000
    Probe Train Set Size
    4000
    Random Seed
    42
    SAE Batch Size
    125

    LLM
    LLM Test Accuracy
    0.95
    LLM Top 1 Test Accuracy
    0.65
    LLM Top 2 Test Accuracy
    0.72
    LLM Top 5 Test Accuracy
    0.78
    LLM Top 10 Test Accuracy
    0.83
    LLM Top 20 Test Accuracy
    0.88
    LLM Top 50 Test Accuracy
    0.92
    LLM Top 100 Test Accuracy
    SAE
    SAE Test Accuracy
    0.96
    SAE Top 1 Test Accuracy
    0.76
    SAE Top 2 Test Accuracy
    0.81
    SAE Top 5 Test Accuracy
    0.88
    SAE Top 10 Test Accuracy
    0.91
    SAE Top 20 Test Accuracy
    0.93
    SAE Top 50 Test Accuracy
    0.95
    SAE Top 100 Test Accuracy

    Dataset Names
    wmdp-bio
    high_school_us_history
    college_computer_science
    high_school_geography
    human_aging
    Dataset Size
    1024
    Intervention Method
    clamp_feature_activation
    LLM Batch Size
    4
    LLM Data Type
    bfloat16
    8
    Model Name
    gemma-2-2b-it
    Multipliers
    [25,50,100,200]
    N Batch Loss Added
    50
    N Features List
    [10,20]
    Random Seed
    42
    Retain Thresholds
    0.001
    0.01
    Save Metrics Flag
    true
    Sequence Length
    1024
    Target Metric
    correct

    Unlearning Score
    0.08