INDEX
Explanations
references to female characters and their relationships in narratives
New Auto-Interp
Negative Logits
.scalablytyped
-0.17
_Lean
-0.15
?action
-0.15
lename
-0.15
riors
-0.14
coe
-0.14
macen
-0.14
@nate
-0.14
_prime
-0.14
adena
-0.14
POSITIVE LOGITS
pt
0.17
edi
0.16
gt
0.15
fashion
0.15
ä¿Ŀ
0.14
εί
0.14
eme
0.14
in
0.14
to
0.13
iman
0.13
Activations Density 0.985%