INDEX
Explanations
references to statistical or numerical data in research contexts
New Auto-Interp
Negative Logits
-0.69
↵
-0.61
in
-0.61
<eos>
-0.60
↵↵
-0.59
.
-0.59
.
-0.55
,
-0.55
de
-0.54
of
-0.52
POSITIVE LOGITS
%:
1.24
✨:
1.23
_:
1.22
autorytatywna
1.21
®:
1.21
:
1.20
!:
1.13
+:
1.11
__:
1.10
:
1.10
Activations Density 0.331%