INDEX
    Explanations

    phrases and sentiments associated with moral judgments and emotional responses

    New Auto-Interp
    Negative Logits
    ear
    -0.15
     zm
    -0.15
     culpa
    -0.14
    830
    -0.14
     invo
    -0.14
     Stock
    -0.13
    SCI
    -0.13
    ecek
    -0.13
    083
    -0.13
    ough
    -0.13
    POSITIVE LOGITS
    gew
    0.15
    วà¸Ķ
    0.15
    ycastle
    0.15
    insk
    0.14
    duto
    0.14
    κοÏį
    0.14
    θε
    0.14
    dol
    0.14
    éĢļ
    0.13
    geois
    0.13
    Act Density 0.168%

    No Known Activations