INDEX
Explanations
phrases and sentiments associated with moral judgments and emotional responses
New Auto-Interp
Negative Logits
ear
-0.15
zm
-0.15
culpa
-0.14
830
-0.14
invo
-0.14
Stock
-0.13
SCI
-0.13
ecek
-0.13
083
-0.13
ough
-0.13
POSITIVE LOGITS
gew
0.15
วà¸Ķ
0.15
ycastle
0.15
insk
0.14
duto
0.14
κοÏį
0.14
θε
0.14
dol
0.14
éĢļ
0.13
geois
0.13
Activations Density 0.168%