INDEX
Explanations
expressions of self-awareness and personal growth mixed with skepticism towards collective beliefs
New Auto-Interp
Negative Logits
hausen
-0.16
hy
-0.13
åİĨ
-0.13
ῦ
-0.13
erc
-0.13
se
-0.13
PLICIT
-0.13
chw
-0.13
_FREQUENCY
-0.13
tplib
-0.13
POSITIVE LOGITS
there
0.24
There
0.22
THERE
0.20
There
0.20
somehow
0.17
there
0.17
mastur
0.16
maybe
0.16
yes
0.16
if
0.16
Activations Density 0.297%