This is the code release of the paper Exploring Long-Sequence Masked Autoencoders:
@Article{hu2022exploring,
author = {Ronghang Hu and Shoubhik Debnath and Saining Xie and Xinlei Chen},
journal = {arXiv:2210.07224},
title = {Exploring Long-Sequence Masked Autoencoders},
year = {2022},
}
-
This repo is a modification on the MAE repo, and supports long-sequence pretraining on both GPUs and TPUs using PyTorch.
-
This repo is based on
timm==0.4.12
, which can be installed viapip3 install timm==0.4.12
.
The following table provides the pre-trained checkpoints used in the paper:
Model (pretrained w/ L=784, image size 448, patch size 16) | ViT-Base | ViT-Large |
---|---|---|
COCO (train2017 + unlabeled2017) 4000-epoch | download | download |
ImageNet-1k 800-epoch | download | download |
ImageNet-1k 1600-epoch | download | download |
- Follow
PRETRAIN_LONG_SEQ_TPU.md
for long-sequence pretraining on Google Cloud TPUs (which we used for our experiments). - Follow
PRETRAIN_LONG_SEQ_GPU.md
for long-sequence pretraining on Nvidia GPUs. - Follow
FINETUNE_DETECTION.md
to fine-tune on the object detection task using the ViTDet codebase from Detectron2.
In addition, this codebase is also compatible with the features in the original MAE repo. Follow README_MAE.md
to use the features of the original MAE repo (such as fine-tuning on image classification).
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.