INDEX
Explanations
specific terminology and concepts related to housing and architecture
New Auto-Interp
Head Attr Weights
0:0.10
1:0.43
2:0.02
3:0.02
4:0.02
5:0.22
6:0.02
7:0.01
8:0.02
9:0.05
10:0.03
11:0.01
Negative Logits
uti
-1.84
uto
-1.73
idel
-1.70
sqor
-1.61
aimon
-1.59
oss
-1.56
ills
-1.55
emis
-1.53
Removed
-1.48
gmail
-1.46
POSITIVE LOGITS
-
2.82
-"
2.32
_>
2.20
-$
2.19
‑
2.19
-'
2.12
ْ
2.05
-)
1.99
-|
1.92
-.
1.91
Activations Density 0.097%