INDEX
Explanations
nonsense words and symbols
specific non-standard characters or symbols
New Auto-Interp
Negative Logits
arton
-0.77
flour
-0.77
leground
-0.73
bonded
-0.73
wagen
-0.72
betting
-0.70
presidency
-0.69
olithic
-0.68
erate
-0.68
agre
-0.68
POSITIVE LOGITS
ãĤ
1.77
ãģ
1.76
ãģª
1.67
ãģĦ
1.56
ãģ¾
1.55
ãĥı
1.53
ãĢģ
1.52
ãĥ¼ãĥ
1.52
ãĥĩ
1.52
ãĤĭ
1.51
Activations Density 0.015%