INDEX
Explanations
expressions of perception or belief in social dynamics
New Auto-Interp
Negative Logits
direct
-0.14
ernel
-0.14
جÙĦ
-0.14
pari
-0.14
Primitive
-0.13
unrelated
-0.13
irect
-0.13
ooth
-0.13
467
-0.13
Äįet
-0.13
POSITIVE LOGITS
undecided
0.47
amb
0.42
uncertainty
0.40
ambiguity
0.39
neutral
0.39
ambiguous
0.38
inde
0.38
uncertain
0.38
unsure
0.37
ambiguous
0.36
Activations Density 0.376%