Skip to content
/ SPiT Public

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

License

Notifications You must be signed in to change notification settings

dsb-ifi/SPiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Marius Aasan, Odd Kolbjørnsen, Anne Schistad Solberg, Adín Ramírez Rivera

DSB @ IFI @ UiO

Website PaperArxiv PaperECCVW NotebookExample

SPiT Figure 1 SPiT Figure 1

SPiT: Superpixel Transformers

This repo contains code and weights for A Spitting Image: Modular Superpixel Tokenization in Vision Transformers, accepted for MELEX, ECCVW 2024.

For an introduction to our work, visit the project webpage.

Installation

We are working on releasing this package on PyPi, however, the package can currently be installed via:

# HTTPS
pip install git+https://github.com/dsb-ifi/SPiT.git

# SSH
pip install git+ssh://git@github.com/dsb-ifi/SPiT.git

Loading models

To load a Superpixel Transformer model, we suggest using the wrapper:

from spit import load_model

model = load_model.load_SPiT_B16(grad=True, pretrained=True)

This will load the model and downloaded the pretrained weights, stored in your local torch.hub directory. If you would rather download the full weights, please use:

Model Link MD5
SPiT-S16 Manual Download 8e899c846a75c51e1c18538db92efddf
SPiT-S16 (w. grad.) Manual Download e49be7009c639c0ccda4bd68ed34e5af
SPiT-B16 Manual Download 9d3483a4c6fdaf603ee6528824d48803
SPiT-B16 (w. grad.) Manual Download 9394072a5d488977b1af05c02aa0d13c
ViT-S16 Manual Download 73af132e4bb1405b510a5eb2ea74cf22
ViT-S16 (w. grad.) Manual Download b8e4f1f219c3baef47fc465eaef9e0d4
ViT-B16 Manual Download ce45dcbec70d61d1c9f944e1899247f1
ViT-B16 (w. grad.) Manual Download 1caa683ecd885347208b0db58118bf40
RViT-S16 Coming Soon
RViT-S16 (w. grad.) Coming Soon
RViT-B16 Manual Download 18c13af67d10f407c3321eb1ca5eb568
RViT-B16 (w. grad.) Manual Download 50d25403adfd5a12d7cb07f7ebfced97

More Examples

We provide a Jupyter notebook as a sandbox for loading, evaluating, and extracting segmentations for the models. Examples will be updated along with new releases and updates for the project repo.

Notes:

RViT and On-Line Voronoi Tesselation

Currently the code features some slight modifications to streamline use of the RViT models. The original RViT models sampled partitions from a dataset of pre-computed Voronoi tesselations for training and evaluation. This is impractical for deployment, and we have yet to implement a CUDA kernel for computing Voronoi with lower memory overhead.

However, we have developed a fast implementation for generating fast tesselations with PCA trees [1], which mimic Voronoi tesselations relatively well, and can be computed on-the-fly. There are, however still some minor issues with the small capacity RViT models. Consequently, the RViT-B16 models will perform marginally different than the reported results in the paper. We appreciate the readers patience with regard to this matter.

Note that the RViT models are inherently stochastic so that different runs can yield different results. Also, it is worth mentioning that SPiT models can yield slightly different results for each run, due to nondeterministic behaviours in CUDA kernels.

[1] Refinements to nearest-neighbor searching in $k$-dimensional trees (Sproull, 1991)

Progress and Current Todo's:

  • Include foundational code and model weights.
  • Add manual links with MD5 hash for manual weight download.
  • Add module for loading models, and provide example notebook.
  • Create temporary solution to on-line Voronoi tesselation.
  • Add standalone train and eval scripts.
  • Add CUDA kernels for on-line Voronoi Tesselations.
  • Add example for extracting attribution maps with Att.Flow and Proto.PCA.
  • Add example for computing sufficiency and comprehensiveness.
  • Add assets for computed attribution maps for XAI experiments.
  • Add code and examples for salient segmentation.
  • Add code and examples for feature correspondences.

Citation

If you find our work useful, please consider citing our work.

@inproceedings{Aasan2024,
  title={A Spitting Image: Modular Superpixel Tokenization in Vision Transformers},
  author={Aasan, Marius and Kolbj\o{}rnsen, Odd and Schistad Solberg, Anne and Ram\'irez Rivera, Ad\'in},
  booktitle={{CVF/ECCV} More Exploration, Less Exploitation Workshop ({MELEX} {ECCVW})},
  year={2024}
}

About

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published