INDEX
Explanations
references to emotional states or descriptions of personal experiences
New Auto-Interp
Negative Logits
!;
-0.86
!";
-0.78
*/;
-0.73
出版年
-0.67
?;
-0.67
!';
-0.67
!",
-0.65
();
-0.65
{}",-0.64
!',
-0.64
POSITIVE LOGITS
)
0.93
.
0.90
)
0.82
.)
0.81
.
0.76
.
0.74
.-
0.73
]
0.70
0.70
、
0.66
Activations Density 1.170%