INDEX
Explanations
instances of political criticism and hypocrisy
New Auto-Interp
Negative Logits
ensen
-0.17
plag
-0.15
ordes
-0.15
utors
-0.15
Ñıз
-0.15
/tutorial
-0.14
eme
-0.14
uÅŁ
-0.14
Modular
-0.14
iedo
-0.13
POSITIVE LOGITS
PTR
0.17
Keller
0.14
ÛĮزÛĮ
0.14
ãĥ³ãĥ
0.14
uevo
0.14
rava
0.14
اباÙĨ
0.13
_ws
0.13
atan
0.13
åī§
0.13
Activations Density 0.519%