INDEX
Explanations
instances of authorship and attribution in the text
New Auto-Interp
Negative Logits
ç»į
-0.14
inka
-0.14
Lag
-0.13
yny
-0.13
Redistributions
-0.13
orre
-0.13
iÅŁtir
-0.12
avig
-0.12
Ä©
-0.12
andon
-0.12
POSITIVE LOGITS
vise
0.16
alley
0.15
aven
0.15
allen
0.14
cigaret
0.13
dre
0.13
millenn
0.13
_PCI
0.13
boss
0.12
mj
0.12
Activations Density 0.157%