INDEX
Explanations
phrases indicating regret or hindsight
instances of the phrase "should have."
New Auto-Interp
Negative Logits
hostage
-0.77
maze
-0.68
fireball
-0.68
craving
-0.64
Mem
-0.64
catentry
-0.60
hots
-0.59
Cold
-0.59
ceiling
-0.58
muse
-0.58
POSITIVE LOGITS
gotten
0.97
been
0.95
been
0.87
gotten
0.86
acted
0.86
behaved
0.83
done
0.79
Been
0.77
ĭ
0.75
taken
0.75
Activations Density 0.057%