Related Work

Various papers and internet posts on training SAEs for vision.

Preprints

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation

Haven't read this yet, but Hugo Fry is an author.

LessWrong

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers

Trains a sparse autoencoder on the 22nd layer of a CLIP ViT-L/14. First public work training an SAE on a ViT. Finds interesting features, demonstrating that SAEs work with ViTs.

Interpreting and Steering Features in Images

Havne't read it yet.

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders

Followup to the above work; haven't read it yet.

A Suite of Vision Sparse Autoencoders

Train a sparse autoencoder on various layers using the TopK with k=32 on a CLIP ViT-L/14 trained on LAION-2B. The SAE is trained on 1.2B tokens including patch (not just [CLS]). Limited evaluation.