INDEX
Explanations
phrases or constructs that emphasize comparison or simile
New Auto-Interp
Negative Logits
alls
-0.17
оÑĢм
-0.17
ingt
-0.16
eters
-0.15
etary
-0.15
cott
-0.14
awe
-0.14
HORT
-0.14
saja
-0.14
instein
-0.14
POSITIVE LOGITS
follows
0.22
cribed
0.21
paragus
0.20
sembl
0.18
cert
0.17
-is
0.16
having
0.15
souÄįást
0.14
dit
0.14
cribe
0.14
Activations Density 0.148%