Block Bench Models Tutorial

Bilinear MLPs enable weight-based mechanistic interpretability

A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Bilinear MLPs enable weight-based mechanistic interpretability

Trending now