GitHub - eminorhan/mae: Masked Autoencoders for image-based SSL

Masked Autoencoders (MAE)

This is my personal copy of Facebook's Masked Autoencoders (MAE) repository for image-based SSL customized for my own purposes. The code here can be used to train and evaluate MAEs.

Usage examples

Training: To train an MAE model with a ViT-S/16 architecture from scratch on your data, use train_mae.py:

python -u train_mae.py \
	--model 'mae_vit_small_patch16' \
	--batch_size_per_gpu 512 \
	--num_workers 16 \
	--lr 0.0003 \
	--min_lr 0.0003 \
	--output_dir OUTPUT_DIR \
	--data_path DATA_PATH \
	--save_prefix INFORMATIVE_SAVE_PREFIX

This version uses the webdataset interface to feed the data into the model. There's a separete training file that uses the standard torch-torchvision data loading interface instead, if you'd prefer to use that: train_mae_nowds.py.

Linear evaluation: To evaluate an MAE model with the linear probing approach, use eval_linear.py:

python -u eval_linear.py \
	--model 'vit_small_patch16' \
	--resume MODEL_PATH \
	--save_prefix INFORMATIVE_SAVE_PREFIX \
	--batch_size 1024 \
	--epochs 50 \
	--num_workers 16 \
	--lr 0.0003 \
	--output_dir OUTPUT_DIR \
	--train_data_path TRAIN_DATA_PATH \
	--val_data_path VAL_DATA_PATH \
	--num_labels 1000

Finetuning evaluation: To evaluate an MAE model with the finetuning approach, use eval_finetune.py:

python -u eval_finetune.py \
	--model 'vit_small_patch16' \
	--resume MODEL_PATH \
	--save_prefix INFORMATIVE_SAVE_PREFIX \
	--batch_size 128 \
	--epochs 50 \
	--num_workers 16 \
	--lr 0.0001 \
	--output_dir OUTPUT_DIR \
	--train_data_path TRAIN_DATA_PATH \
	--val_data_path VAL_DATA_PATH \
	--frac_retained 0.010147 \
	--num_labels 1000

Here frac_retained is the fraction of the training set used for finetuning and can be set to do few-shot finetuning evals (e.g. --frac_retained 0.01 corresponds to finetuning with 1% of the training data, i.e. 12-13 examples per class in the case of ImageNet).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
eval_scripts		eval_scripts
evals		evals
giga_expt		giga_expt
models_vitb16		models_vitb16
models_vith14		models_vith14
models_vith14_imagenet		models_vith14_imagenet
models_vitl16		models_vitl16
models_vits16		models_vits16
segmentation		segmentation
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_finetune.py		engine_finetune.py
eval_finetune.py		eval_finetune.py
eval_finetune_accum.py		eval_finetune_accum.py
eval_linear.py		eval_linear.py
eval_video_segmentation.py		eval_video_segmentation.py
models_mae.py		models_mae.py
models_vit.py		models_vit.py
train_giga_mae.py		train_giga_mae.py
train_mae.py		train_mae.py
train_mae.sh		train_mae.sh
train_mae_nowds.py		train_mae_nowds.py
train_mae_s.sh		train_mae_s.sh
train_mae_sayavakepicutego4d.sh		train_mae_sayavakepicutego4d.sh
visualize_mae.py		visualize_mae.py
visualize_mae.sh		visualize_mae.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Masked Autoencoders (MAE)

Usage examples

About

Releases

Packages

Languages

License

eminorhan/mae

Folders and files

Latest commit

History

Repository files navigation

Masked Autoencoders (MAE)

Usage examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages