GitHub - sunsmarterjie/beyond_masking: Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

The code is coming

Figure 1: Pipeline of token-based pre-training.

Figure 2: The visualization of the proposed 5 tasks.

All the results are pre-trained for 300 epochs using Vit-base as default.

	zoomed-in	zoomed-out	distorted	blurred	de-colorized
finetune	`82.7`	`82.5`	`82.1`	`81.8`	`81.4`

	zoomed-in (a)	mask (m)	(a)+(m)
finetune	`82.7`	`82.9`	`83.2`

We note that the integrated version dose not require extra computational cost.

Figure 3: Efficiency of the integrated task.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
blurred		blurred
decolorized		decolorized
distorted		distorted
img		img
zoomed-in		zoomed-in
zoomed-out		zoomed-out
README.md		README.md

Provide feedback