INDEX
    Explanations

    strong statements against violence and discrimination

    New Auto-Interp
    Negative Logits
    vid
    -0.15
    stra
    -0.14
    illa
    -0.14
    aux
    -0.14
    fox
    -0.14
    impl
    -0.14
    illas
    -0.13
    omo
    -0.13
    ock
    -0.13
    ao
    -0.13
    POSITIVE LOGITS
    enan
    0.17
    rous
    0.16
    zar
    0.16
    éric
    0.15
     tand
    0.15
    rzy
    0.15
    PILE
    0.14
    UBY
    0.14
    oldem
    0.14
    FindObject
    0.14
    Act Density 0.103%

    No Known Activations