INDEX
Explanations
isolated segments of code or technical content
New Auto-Interp
Negative Logits
<bos>
-0.80
.
-0.48
/
-0.46
,
-0.44
<eos>
-0.44
3
-0.42
4
-0.41
1
-0.41
5
-0.41
:
-0.40
POSITIVE LOGITS
ſeveral
0.84
Monfieur
0.81
Theſe
0.80
itſelf
0.78
whoſe
0.75
>\<^
0.75
Efq
0.74
^(@)
0.74
myſelf
0.74
་་
0.74
Activations Density 10.248%