INDEX
    Explanations

    expressions of moral outrage and condemnation regarding social and ethical issues

    New Auto-Interp
    Negative Logits
    iform
    -0.16
    ICA
    -0.14
    arts
    -0.14
    ãĤ¹ãĤ«
    -0.14
    ÅĻiv
    -0.14
    illard
    -0.14
     ngoại
    -0.14
     Loft
    -0.14
    legen
    -0.13
     firm
    -0.13
    POSITIVE LOGITS
    oldt
    0.15
    isko
    0.15
    ób
    0.14
    elize
    0.14
    _pb
    0.14
    pch
    0.14
    ноÑĩ
    0.14
    wake
    0.14
    iox
    0.14
    ãģ¡ãģ¯
    0.14
    Act Density 0.293%

    No Known Activations