INDEX
Explanations
references to specific names related to media or technology
proper nouns, particularly names and locations
New Auto-Interp
Negative Logits
eers
-0.83
ilage
-0.80
Purs
-0.73
eer
-0.72
escription
-0.71
Bolshe
-0.69
lain
-0.68
trl
-0.68
İĭ
-0.67
£ı
-0.66
POSITIVE LOGITS
caster
0.82
axis
0.80
mington
0.75
endi
0.74
casting
0.70
vale
0.70
IELD
0.69
aii
0.69
Rosenberg
0.66
PM
0.66
Activations Density 0.034%