INDEX
    Explanations

    affirmations or confirmations of statements

    New Auto-Interp
    Negative Logits
     Forg
    -0.15
    elay
    -0.15
    mind
    -0.14
    bor
    -0.14
     aka
    -0.14
    öh
    -0.14
    -sort
    -0.14
    âĨ
    -0.13
    nid
    -0.13
    âĨij
    -0.13
    POSITIVE LOGITS
     answer
    0.17
    option
    0.17
    çŃĶæ¡Ī
    0.17
     Options
    0.17
     correct
    0.16
     solution
    0.16
     hint
    0.16
    Which
    0.16
    answer
    0.16
     Which
    0.16
    Act Density 0.106%

    No Known Activations