INDEX
Explanations
phrases indicating moral judgment or hypocrisy in political discourse
New Auto-Interp
Negative Logits
¼
-0.14
spyOn
-0.14
sẵn
-0.14
oug
-0.14
WithURL
-0.14
нен
-0.14
grâce
-0.14
lia
-0.14
³ç´°
-0.14
gratis
-0.13
POSITIVE LOGITS
border
0.23
borders
0.18
unless
0.18
border
0.18
beyond
0.18
-border
0.18
behavior
0.18
considering
0.18
attempted
0.17
attempt
0.16
Activations Density 0.257%