INDEX
    Explanations

    judgments regarding moral and ethical standards related to exploitation and human rights issues

    New Auto-Interp
    Negative Logits
     дописавши
    -0.94
    AndEndTag
    -0.94
     ModelExpression
    -0.88
     Wicidata
    -0.87
    kháu
    -0.80
    Попис
    -0.77
     سكانية
    -0.76
     <<<<<<<<<<<<<<
    -0.75
    +#+#
    -0.71
    wieś
    -0.68
    POSITIVE LOGITS
     unacceptable
    0.68
    👎
    0.57
     violates
    0.57
     outright
    0.57
     harmful
    0.56
    不应该
    0.55
     intolerable
    0.54
     violation
    0.54
     downright
    0.54
    0.52
    Act Density 0.391%

    No Known Activations