INDEX
    Explanations

    sentences that include personal affirmations or declarations of identity

    New Auto-Interp
    Negative Logits
     Bisch
    -0.61
    Gimme
    -0.60
     ч
    -0.55
    vincing
    -0.55
     polaire
    -0.52
     fluores
    -0.52
     vernac
    -0.51
    -0.51
    umpulkan
    -0.51
     zask
    -0.51
    POSITIVE LOGITS
    aarrggbb
    0.94
    mektedir
    0.92
     utafitiHapana
    0.83
    maktadır
    0.80
    ofold
    0.76
    mıştır
    0.74
    のですね
    0.71
     goederen
    0.70
    íncia
    0.69
    AutoScaleMode
    0.69
    Act Density 0.504%

    No Known Activations