Swin main layer #17693

amyeroberts · 2022-06-13T15:57:13Z

What does this PR do?

Refactor the Swin model to have a MainLayer which is called by all models to get the Swin outputs (pre-head).

c.f. relevant comment from @sayakpaul on ResNet port

The following script was run to check weights could still be successfully loaded into the TF models:

from transformers import AutoFeatureExtractor, TFSwinForImageClassification, TFSwinForMaskedImageModeling

checkpoint = "microsoft/swin-tiny-patch4-window7-224"

# relative_position_index isn't updated during training. In TF set as instance param
print("\nTFSwinForImageClassification - from PyTorch checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint, from_pt=True)
print("\nTFSwinForImageClassification - from TF checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint)

# relative_position_index isn't updated during training. In TF set as instance param
# We don't have a masked image modeling checkpoint - use image classification checkpoint
# Some weights will not be used (classifier head)
# Some weights newly initialised (decoder, mask token)
print("\nTFSwinForMaskedImageModeling - from PyTorch checkpoint")
tf_model = TFSwinForMaskedImageModeling.from_pretrained(checkpoint, from_pt=True)
print("\nTFSwinForMaskedImageModeling - from TF checkpoint")
tf_model = TFSwinForMaskedImageModeling.from_pretrained(checkpoint)

Produced the outputs:

TFSwinForImageClassification - from PyTorch checkpoint
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFSwinForImageClassification: ['swin.encoder.layers.1.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.4.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.3.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.5.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.2.attention.self.relative_position_index']
- This IS expected if you are initializing TFSwinForImageClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForImageClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFSwinForImageClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

TFSwinForImageClassification - from TF checkpoint
All model checkpoint layers were used when initializing TFSwinForImageClassification.

All the layers of TFSwinForImageClassification were initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

TFSwinForMaskedImageModeling - from PyTorch checkpoint
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFSwinForMaskedImageModeling: ['classifier.weight', 'swin.encoder.layers.1.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.4.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.3.attention.self.relative_position_index', 'classifier.bias', 'swin.encoder.layers.2.blocks.5.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.2.attention.self.relative_position_index']
- This IS expected if you are initializing TFSwinForMaskedImageModeling from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForMaskedImageModeling from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFSwinForMaskedImageModeling were not initialized from the PyTorch model and are newly initialized: ['swin.embeddings.mask_token', 'decoder.0.weight', 'decoder.0.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

TFSwinForMaskedImageModeling - from TF checkpoint
Some layers from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 were not used when initializing TFSwinForMaskedImageModeling: ['classifier']
- This IS expected if you are initializing TFSwinForMaskedImageModeling from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForMaskedImageModeling from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFSwinForMaskedImageModeling were not initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 and are newly initialized: ['decoder', 'swin/embeddings/mask_token:0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2022-06-13T16:07:47Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for fixing this!

Rocketknight1

This looks like a clean refactor! (Even though I still want to investigate whether we can banish MainLayer. It can wait!)

amyeroberts added 3 commits June 13, 2022 16:30

Swin models call TFSwinMainLayer

0636d05

Tidy up

b01cb52

Tidy up

7ac31eb

amyeroberts requested review from sgugger and Rocketknight1 June 13, 2022 16:33

sgugger approved these changes Jun 13, 2022

View reviewed changes

Rocketknight1 approved these changes Jun 14, 2022

View reviewed changes

amyeroberts merged commit bd43151 into huggingface:main Jun 14, 2022

amyeroberts deleted the swin-main-layer branch June 14, 2022 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swin main layer #17693

Swin main layer #17693

amyeroberts commented Jun 13, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 13, 2022 •

edited

Loading

sgugger left a comment

Rocketknight1 left a comment

Swin main layer #17693

Swin main layer #17693

Conversation

amyeroberts commented Jun 13, 2022 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Jun 13, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Rocketknight1 left a comment

Choose a reason for hiding this comment

amyeroberts commented Jun 13, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 13, 2022 •

edited

Loading