INDEX
Explanations
terms related to human bodies
references to various body parts or physical attributes
New Auto-Interp
Negative Logits
antha
-0.87
atile
-0.85
inates
-0.85
ournal
-0.84
apo
-0.84
iencies
-0.80
anth
-0.79
iating
-0.76
iated
-0.76
ives
-0.75
POSITIVE LOGITS
cht
0.73
zos
0.71
\\\\\\\\
0.70
Parties
0.70
holder
0.68
parts
0.66
gger
0.66
Dele
0.64
Alfred
0.64
ghazi
0.64
Activations Density 0.077%