INDEX
    Explanations

    references to engaging with the natural world and escaping civilization

    New Auto-Interp
    Negative Logits
     downs
    -0.15
     wa
    -0.14
     Decide
    -0.14
     discrim
    -0.14
     ming
    -0.14
     Comple
    -0.13
    sanitize
    -0.13
    longleftrightarrow
    -0.13
     resett
    -0.13
     comm
    -0.13
    POSITIVE LOGITS
     Use
    0.20
     don
    0.19
    Use
    0.18
     start
    0.18
     use
    0.17
    eph
    0.17
    don
    0.17
     Start
    0.16
     make
    0.16
    Don
    0.16
    Act Density 0.261%

    No Known Activations