INDEX
Explanations
social media handles or usernames made up of alphanumeric characters and symbols
non-standard characters and symbols
New Auto-Interp
Negative Logits
Tid
-0.76
laz
-0.70
Lazarus
-0.70
itar
-0.68
Bernstein
-0.65
unks
-0.65
Winston
-0.63
folk
-0.63
Sle
-0.63
interf
-0.62
POSITIVE LOGITS
Į
2.37
İ
1.99
ĩ
1.96
Ĵ
1.96
ı
1.92
Ķ
1.91
į
1.91
ļ
1.89
IJ
1.88
ĵ
1.87
Activations Density 0.020%