Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swin main layer #17693

Merged
merged 3 commits into from
Jun 14, 2022
Merged

Swin main layer #17693

merged 3 commits into from
Jun 14, 2022

Conversation

amyeroberts
Copy link
Collaborator

@amyeroberts amyeroberts commented Jun 13, 2022

What does this PR do?

Refactor the Swin model to have a MainLayer which is called by all models to get the Swin outputs (pre-head).

c.f. relevant comment from @sayakpaul on ResNet port

The following script was run to check weights could still be successfully loaded into the TF models:

from transformers import AutoFeatureExtractor, TFSwinForImageClassification, TFSwinForMaskedImageModeling

checkpoint = "microsoft/swin-tiny-patch4-window7-224"

# relative_position_index isn't updated during training. In TF set as instance param
print("\nTFSwinForImageClassification - from PyTorch checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint, from_pt=True)
print("\nTFSwinForImageClassification - from TF checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint)

# relative_position_index isn't updated during training. In TF set as instance param
# We don't have a masked image modeling checkpoint - use image classification checkpoint
# Some weights will not be used (classifier head)
# Some weights newly initialised (decoder, mask token)
print("\nTFSwinForMaskedImageModeling - from PyTorch checkpoint")
tf_model = TFSwinForMaskedImageModeling.from_pretrained(checkpoint, from_pt=True)
print("\nTFSwinForMaskedImageModeling - from TF checkpoint")
tf_model = TFSwinForMaskedImageModeling.from_pretrained(checkpoint)

Produced the outputs:

TFSwinForImageClassification - from PyTorch checkpoint
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFSwinForImageClassification: ['swin.encoder.layers.1.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.4.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.3.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.5.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.2.attention.self.relative_position_index']
- This IS expected if you are initializing TFSwinForImageClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForImageClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFSwinForImageClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

TFSwinForImageClassification - from TF checkpoint
All model checkpoint layers were used when initializing TFSwinForImageClassification.

All the layers of TFSwinForImageClassification were initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

TFSwinForMaskedImageModeling - from PyTorch checkpoint
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFSwinForMaskedImageModeling: ['classifier.weight', 'swin.encoder.layers.1.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.4.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.3.attention.self.relative_position_index', 'classifier.bias', 'swin.encoder.layers.2.blocks.5.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.2.attention.self.relative_position_index']
- This IS expected if you are initializing TFSwinForMaskedImageModeling from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForMaskedImageModeling from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFSwinForMaskedImageModeling were not initialized from the PyTorch model and are newly initialized: ['swin.embeddings.mask_token', 'decoder.0.weight', 'decoder.0.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

TFSwinForMaskedImageModeling - from TF checkpoint
Some layers from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 were not used when initializing TFSwinForMaskedImageModeling: ['classifier']
- This IS expected if you are initializing TFSwinForMaskedImageModeling from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForMaskedImageModeling from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFSwinForMaskedImageModeling were not initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 and are newly initialized: ['decoder', 'swin/embeddings/mask_token:0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 13, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a clean refactor! (Even though I still want to investigate whether we can banish MainLayer. It can wait!)

@amyeroberts amyeroberts merged commit bd43151 into huggingface:main Jun 14, 2022
@amyeroberts amyeroberts deleted the swin-main-layer branch June 14, 2022 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants