GitHub - erictang000/182cvproj: Tiny ImageNet Classifier with ViT, PiT, and CrossViT inspired models.

182 CV Project: Tiny ImageNet Classification with Cross Attention Vision Transformers and Sparse Attention.

In this project, we attempt a number of different techniques for building a robust classifier for the Tiny ImageNet challenge. We focus on the latest models using Vision Transformers, which have been shown to help improve robustness against out of distribution examples from both white box and black box attackers. We report a top 1 validation accuracy of 81% on our architecture from fine tuning on Tiny ImageNet, using vision transformer blocks that were pretrained with ImageNet 1k, and using standard data augmentations along with AugMix. Our architecture, loosely based off of CrossViT from Chen et al., shows performance improvements over a standard ViT model via parallel vision transformers attending to different image patch sizes combined with cross attention and an MLP head. We also observe faster training and higher clean accuracy compared with deeper stacked ViT architectures with similar numbers of parameters. We benchmark robustness and accuracy of our model against a variety of ViT and ResNet based models on Tiny Imagenet-C and with adversarial attacks from Foolbox, and evaluate the addition of cross attention and varying patch sizes, as well as the use of sparse attention, to classifying out of distribution images.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
exploration		exploration
images		images
models		models
train-scripts		train-scripts
utils		utils
README.md		README.md
eval.py		eval.py
report.pdf		report.pdf
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

182 CV Project: Tiny ImageNet Classification with Cross Attention Vision Transformers and Sparse Attention.

About

Releases

Packages

Languages

erictang000/182cvproj

Folders and files

Latest commit

History

Repository files navigation

182 CV Project: Tiny ImageNet Classification with Cross Attention Vision Transformers and Sparse Attention.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages