-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models #1479
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* This is required as torch.no_grad doesn't change the model configuration while manual gradient deactivation/activation can have unintended consequences. For example, MAE ViT positional embeddings are parameters with requires_grad=False that should never receive an update. But if we use activate_requires_grad for finetuning we break those parameters.
…om:lightly-ai/lightly into guarin-lig-3056-add-mae-imagenet-benchmark
guarin
changed the base branch from
ersi-lig-3910-update-mae-benchmark-code
to
master
January 23, 2024 07:38
guarin
changed the title
Add AIM
Add AIM model from Scalable Pre-training of Large Autoregressive Image Models
Jan 23, 2024
guarin
changed the title
Add AIM model from Scalable Pre-training of Large Autoregressive Image Models
Add AIM Model from Scalable Pre-training of Large Autoregressive Image Models
Jan 23, 2024
ersi-lightly
approved these changes
Jan 23, 2024
FWIW, this PR caused a bit of a headache for us in TorchGeo: microsoft/torchgeo#1824 At the moment, the changes here make lightly v1.4.26 incompatible with any version of segmentation-models-pytorch. This isn't necessarily your fault, but it would help if you could check the version of timm available before importing everything else. |
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the
AIM
model proposed in Scalable Pre-training of Large Autoregressive Image Models. The implementation is based on the original code but uses a modified version of the vision transformer from timm as backbone. The backbone is fully compatible with the timm vision transformer and pretrained weights from our backbone should be loadable with the timm vision transformer (state dicts are identical).The implementation is a best effort. The paper and reference code miss some crucial information. Specifically, the prefix length and detailed description of the MLP architecture for the prediction head are missing. Nevertheless, the current implementation is running and is hopefully a good start.I checked with the authors, the head and prefix masking should be correct now :)Changes
MaskedCausalVisionTransformer
AIMPredictionHead
AIMTransform
AIM
benchmark moduleTODO:
We also have to figure out whether we want to add this to
benchmarks/imagenet/vitb16
because the backbone is clearly not vitb16 😅How was it tested?
For Review
Review is only required for the following files/functions:
benchmarks/imagenet/vitb16/aim.py
lightly/models/modules/__init__.py
lightly/models/modules/heads_timm.py
lightly/models/modules/masked_causal_vision_transformer.py
lightly/models/utils.py
->random_prefix_mask
functionlightly/transforms/aim_transform.py
The other files/functions have already been reviewed in other PRs but are not yet on master.