Various papers and internet posts on training SAEs for vision.
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
- Haven't read this yet, but Hugo Fry is an author.
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
- Trains a sparse autoencoder on the 22nd layer of a CLIP ViT-L/14. First public work training an SAE on a ViT. Finds interesting features, demonstrating that SAEs work with ViTs.
Interpreting and Steering Features in Images
- Havne't read it yet.
Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
- Followup to the above work; haven't read it yet.
A Suite of Vision Sparse Autoencoders
- Train a sparse autoencoder on various layers using the TopK with k=32 on a CLIP ViT-L/14 trained on LAION-2B. The SAE is trained on 1.2B tokens including patch (not just [CLS]). Limited evaluation.