INDEX
Explanations
mentions of the word "same."
references to same-sex marriage
New Auto-Interp
Negative Logits
erest
-0.71
OST
-0.71
arest
-0.71
Bulg
-0.70
Bei
-0.65
ostic
-0.65
ASE
-0.65
åĩ
-0.64
efully
-0.63
liest
-0.63
POSITIVE LOGITS
vein
0.79
rity
0.76
fam
0.74
bilt
0.69
sex
0.69
gender
0.68
exact
0.68
ials
0.65
sorts
0.64
Day
0.63
Activations Density 0.024%