Circuit Tracing with Interpretable Attention

OpenMOSS Team, Fudan University

llama-scope-2

OpenMOSS extended Anthropic's circuit tracing work to add interpretable attention in addition to MLP transcoders, calling themComplete Replacement Models (CRMs). Neuronpedia now supports generating CRM graphs on Qwen3-1.7B.

CRM graphs have a new node type to represent attention calledLORSA (Low-Rank Sparse Attention), which are displayed as triangle ▲ nodes to visually distinguish them from transcoder circle ⏺ nodes.

Since CRM graphs incorporate both transcoders and LORSA, they refer to two sets of dashboards. When selecting LORSA (triangle) nodes, you'll see the LORSA dashboard, which shows attention Z patterns when hovering over top activation tokens.

Additionally, LORSA nodes show QK tracing results under the Node Connections panel — including top marginal and pairwise (query-feature, key-feature) contributors. These tell us why a LORSA feature attends from one position to another.

Search Explanations

Browse

Features in QWEN3-1.7B@0-llamascope-2-lorsa-16k-k64

Hover over a feature on the left to preview its details.
Click a feature to lock it and interact with it.

Jump To

Search Explanations

Browse

Jump To

Search Explanations

Browse