Skip to content

Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation". Code coming soon.

Notifications You must be signed in to change notification settings

dynamical-inference/patchsae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

[🔗 website & 🤗 demo] [📑 pre-print]

Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation" (Lim et al., 2024).

image

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

Stay tuned for the code and model release!

About

Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation". Code coming soon.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published