Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479

guarin · 2024-01-19T16:43:46Z

This PR implements the AIM model proposed in Scalable Pre-training of Large Autoregressive Image Models. The implementation is based on the original code but uses a modified version of the vision transformer from timm as backbone. The backbone is fully compatible with the timm vision transformer and pretrained weights from our backbone should be loadable with the timm vision transformer (state dicts are identical).

The implementation is a best effort. The paper and reference code miss some crucial information. Specifically, the prefix length and detailed description of the MLP architecture for the prediction head are missing. Nevertheless, the current implementation is running and is hopefully a good start. I checked with the authors, the head and prefix masking should be correct now :)

Changes

Add MaskedCausalVisionTransformer
Add AIMPredictionHead
Add AIMTransform
Add AIM benchmark module

TODO:

We also have to figure out whether we want to add this to benchmarks/imagenet/vitb16 because the backbone is clearly not vitb16 😅

How was it tested?

Manually, I tested a smaller version of the model and it runs well. Couldn't benchmark the full model due to compute limitations as the smallest model version requires 600M params for the backbone and 400M params for the head.
Will add unit tests in a follow-up PR.

For Review

Review is only required for the following files/functions:

benchmarks/imagenet/vitb16/aim.py
lightly/models/modules/__init__.py
lightly/models/modules/heads_timm.py
lightly/models/modules/masked_causal_vision_transformer.py
lightly/models/utils.py -> random_prefix_mask function
lightly/transforms/aim_transform.py

The other files/functions have already been reviewed in other PRs but are not yet on master.

* This is required as torch.no_grad doesn't change the model configuration while manual gradient deactivation/activation can have unintended consequences. For example, MAE ViT positional embeddings are parameters with requires_grad=False that should never receive an update. But if we use activate_requires_grad for finetuning we break those parameters.

…om:lightly-ai/lightly into guarin-lig-3056-add-mae-imagenet-benchmark

lightly/models/modules/masked_causal_vision_transformer.py

lightly/models/utils.py

benchmarks/imagenet/vitb16/aim.py

adamjstewart · 2024-01-25T09:33:05Z

FWIW, this PR caused a bit of a headache for us in TorchGeo: microsoft/torchgeo#1824

At the moment, the changes here make lightly v1.4.26 incompatible with any version of segmentation-models-pytorch. This isn't necessarily your fault, but it would help if you could check the version of timm available before importing everything else.

guarin and others added 30 commits May 30, 2023 14:25

Add MAE evaluation

951d43e

Add stochastic depth dropout

503cc44

Add MAE

ac43499

Drop assertion

15bfe3a

Fix smooth cross entropy loss and mixup

49c85c0

Update comments

9d95783

Add layer lr decay and weight decay

0bb601d

Update comment

d7d69af

Add test for MAE images_to_tokens

ec05437

Disable BN update

923a606

Add BN before classification head

bdce8a6

Format

316f918

Fix BN freezing

a6943fd

Cleanup

1a2b454

Create new stochastic depth instances

d56e340

Add mask token to learnable params

5ed6803

Add sine-cosine positional embedding

4f0baf1

Initialize parameters as in paper

9c4a8cf

Merge branch 'master' into guarin-lig-3056-add-mae-imagenet-benchmark

9904c10

Fix types

83edd1c

Format

e27946e

Merge branch 'guarin-lig-3056-add-mae-imagenet-benchmark' of github.c…

0672b0a

…om:lightly-ai/lightly into guarin-lig-3056-add-mae-imagenet-benchmark

adjusted to existing interface

45433c5

draft

c5cab9e

remove

017168e

added modifications

278423b

added mae implementation with timm and example

fde116c

formatted

f008645

fixed import

c97112e

Update args

2ce38d1

guarin mentioned this pull request Jan 22, 2024

Self-Supervised Methods Tracker #1172

Open

6 tasks

guarin added 4 commits January 22, 2024 10:31

Update head architecture

f82846a

Update masking

bf613e6

Add benchmark arguments

78b391d

Set default float32 matmul precision

0c7936a

guarin changed the base branch from ersi-lig-3910-update-mae-benchmark-code to master January 23, 2024 07:38

guarin added 3 commits January 23, 2024 07:47

Remove MAE changes

39b09f4

Update docstrings

ce6060e

Fix types

48e8513

guarin marked this pull request as ready for review January 23, 2024 08:04

guarin and others added 6 commits January 23, 2024 09:04

Merge branch 'master' into aim

22e3ac5

More type ignore for timm

2fca7f2

Merge branch 'aim' of https://github.com/lightly-ai/lightly into aim

64f6442

More type ignores

832a24d

More type ignore

d16da74

Drop extra norm

0d00f93

guarin changed the title ~~Add AIM~~ Add AIM model from Scalable Pre-training of Large Autoregressive Image Models Jan 23, 2024

guarin changed the title ~~Add AIM model from Scalable Pre-training of Large Autoregressive Image Models~~ Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models Jan 23, 2024

Disable bias in head

aa3dde8

ersi-lightly reviewed Jan 23, 2024

View reviewed changes

guarin added 2 commits January 23, 2024 14:37

Move mask creation to method

299db09

Add optional mask in forward

6b93bf5

ersi-lightly approved these changes Jan 23, 2024

View reviewed changes

guarin merged commit 87be5a1 into master Jan 23, 2024
7 of 9 checks passed

guarin deleted the aim branch January 23, 2024 15:24

adamjstewart mentioned this pull request Jan 25, 2024

The new lightly release breaks BaseTask with timm imports microsoft/torchgeo#1824

Closed

guarin mentioned this pull request Jan 25, 2024

Add timm version check #1487

Merged

guarin mentioned this pull request Feb 12, 2024

Add AIM examples and docs #1497

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479

Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479

guarin commented Jan 19, 2024 •

edited

Loading

adamjstewart commented Jan 25, 2024

Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479

Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479

Conversation

guarin commented Jan 19, 2024 • edited Loading

Changes

How was it tested?

For Review

adamjstewart commented Jan 25, 2024

guarin commented Jan 19, 2024 •

edited

Loading