INDEX
    Explanations

    phrases indicating moral judgment or hypocrisy in political discourse

    New Auto-Interp
    Negative Logits
    ¼
    -0.14
     spyOn
    -0.14
     sẵn
    -0.14
    oug
    -0.14
    WithURL
    -0.14
    нен
    -0.14
     grâce
    -0.14
    lia
    -0.14
    ³ç´°
    -0.14
    gratis
    -0.13
    POSITIVE LOGITS
     border
    0.23
     borders
    0.18
     unless
    0.18
    border
    0.18
     beyond
    0.18
    -border
    0.18
     behavior
    0.18
     considering
    0.18
     attempted
    0.17
     attempt
    0.16
    Act Density 0.257%

    No Known Activations