INDEX
Explanations
strong statements against violence and discrimination
New Auto-Interp
Negative Logits
vid
-0.15
stra
-0.14
illa
-0.14
aux
-0.14
fox
-0.14
impl
-0.14
illas
-0.13
omo
-0.13
ock
-0.13
ao
-0.13
POSITIVE LOGITS
enan
0.17
rous
0.16
zar
0.16
éric
0.15
tand
0.15
rzy
0.15
PILE
0.14
UBY
0.14
oldem
0.14
FindObject
0.14
Activations Density 0.103%