INDEX
Explanations
expressions of personal opinion and moral judgments
New Auto-Interp
Negative Logits
bard
-0.16
qd
-0.16
yd
-0.15
/repos
-0.15
esz
-0.14
orary
-0.14
abbage
-0.14
Planet
-0.14
fé
-0.13
mojom
-0.13
POSITIVE LOGITS
mosquito
0.17
ucch
0.15
iaux
0.15
ustum
0.14
myself
0.14
reau
0.14
siti
0.14
metic
0.14
.cx
0.14
OCK
0.14
Activations Density 0.218%