INDEX
Explanations
questions that ask for truthfulness or correctness regarding options or statements
New Auto-Interp
Negative Logits
myſelf
-0.89
Jefus
-0.83
Reſ
-0.82
AndEndTag
-0.82
Juneau
-0.81
faſt
-0.80
protoimpl
-0.79
himſelf
-0.79
poffible
-0.77
ſeveral
-0.76
POSITIVE LOGITS
A
0.51
primary
0.51
وتسجيلات
0.51
<eos>
0.48
Is
0.47
Gemeinde
0.46
↵↵
0.46
may
0.45
T
0.44
fortawesome
0.44
Activations Density 0.341%