A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Marius Aasan, Odd Kolbjørnsen, Anne Schistad Solberg, Adín Ramírez Rivera

SPiT: Superpixel Transformers

This repo contains code and weights for A Spitting Image: Modular Superpixel Tokenization in Vision Transformers, accepted for MELEX, ECCVW 2024.

For an introduction to our work, visit the project webpage.

Installation

We are working on releasing this package on PyPi, however, the package can currently be installed via:

# HTTPS
pip install git+https://github.com/dsb-ifi/SPiT.git

# SSH
pip install git+ssh://git@github.com/dsb-ifi/SPiT.git

Loading models

To load a Superpixel Transformer model, we suggest using the wrapper:

from spit import load_model

model = load_model.load_SPiT_B16(grad=True, pretrained=True)

This will load the model and downloaded the pretrained weights, stored in your local torch.hub directory. If you would rather download the full weights, please use:

Model	Link	MD5
SPiT-S16	Manual Download	8e899c846a75c51e1c18538db92efddf
SPiT-S16 (w. grad.)	Manual Download	e49be7009c639c0ccda4bd68ed34e5af
SPiT-B16	Manual Download	9d3483a4c6fdaf603ee6528824d48803
SPiT-B16 (w. grad.)	Manual Download	9394072a5d488977b1af05c02aa0d13c
ViT-S16	Manual Download	73af132e4bb1405b510a5eb2ea74cf22
ViT-S16 (w. grad.)	Manual Download	b8e4f1f219c3baef47fc465eaef9e0d4
ViT-B16	Manual Download	ce45dcbec70d61d1c9f944e1899247f1
ViT-B16 (w. grad.)	Manual Download	1caa683ecd885347208b0db58118bf40
RViT-S16	Coming Soon
RViT-S16 (w. grad.)	Coming Soon
RViT-B16	Manual Download	18c13af67d10f407c3321eb1ca5eb568
RViT-B16 (w. grad.)	Manual Download	50d25403adfd5a12d7cb07f7ebfced97

More Examples

We provide a Jupyter notebook as a sandbox for loading, evaluating, and extracting segmentations for the models. Examples will be updated along with new releases and updates for the project repo.

Notes:

RViT and On-Line Voronoi Tesselation

Currently the code features some slight modifications to streamline use of the RViT models. The original RViT models sampled partitions from a dataset of pre-computed Voronoi tesselations for training and evaluation. This is impractical for deployment, and we have yet to implement a CUDA kernel for computing Voronoi with lower memory overhead.

However, we have developed a fast implementation for generating fast tesselations with PCA trees [1], which mimic Voronoi tesselations relatively well, and can be computed on-the-fly. There are, however still some minor issues with the small capacity RViT models. Consequently, the RViT-B16 models will perform marginally different than the reported results in the paper. We appreciate the readers patience with regard to this matter.

Note that the RViT models are inherently stochastic so that different runs can yield different results. Also, it is worth mentioning that SPiT models can yield slightly different results for each run, due to nondeterministic behaviours in CUDA kernels.

[1] Refinements to nearest-neighbor searching in $k$-dimensional trees (Sproull, 1991)

Progress and Current Todo's:

Citation

If you find our work useful, please consider citing our work.

@inproceedings{Aasan2024,
  title={A Spitting Image: Modular Superpixel Tokenization in Vision Transformers},
  author={Aasan, Marius and Kolbj\o{}rnsen, Odd and Schistad Solberg, Anne and Ram\'irez Rivera, Ad\'in},
  booktitle={{CVF/ECCV} More Exploration, Less Exploitation Workshop ({MELEX} {ECCVW})},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
notebooks		notebooks
spit		spit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

SPiT: Superpixel Transformers

Installation

Loading models

More Examples

Notes:

RViT and On-Line Voronoi Tesselation

Progress and Current Todo's:

Citation

About

Releases

Packages

Contributors 2

Languages

License

dsb-ifi/SPiT

Folders and files

Latest commit

History

Repository files navigation

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

SPiT: Superpixel Transformers

Installation

Loading models

More Examples

Notes:

RViT and On-Line Voronoi Tesselation

Progress and Current Todo's:

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages