Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ImageGPT to mappings #16869

Closed
wants to merge 1 commit into from

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 21, 2022

What does this PR do?

Similar to #16857, but only for ImageGPT, together with the necessary to make the (pipeline) tests pass.

@ydshieh ydshieh changed the title add imagegpt add ImageGPT to mappings Apr 21, 2022
@ydshieh ydshieh changed the title add ImageGPT to mappings Add ImageGPT to mappings Apr 21, 2022
Comment on lines +83 to +89
# No need to perform padding if all lengths are equal
all_lengths = [item[key].shape[1] for item in items]
if len(set(all_lengths)) == 1:
tensor = torch.stack([item[key][0] for item in items], dim=0)
return tensor

max_length = max(all_lengths)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary to make image classification pipeline test (with ImageGPTForImageClassification) pass.

More precisely, the follofwing line would fail with padding_value being None.

torch.zeros((batch_size, max_length), dtype=dtype) + padding_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make that change.

We can on the other hand add a meaningful padding value for that key (here 0 I guess).

The fact that None is passed is the bug here I think.

The problem with this change, is that you have about 0 information what kind of tensors are sent here, so shape[1] is most likely going to trigger a bug somewhere somehow (maybe shape[1] doesn't exist, or shape[2] exists and prevent stacking for instance).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's true. I forgot to add some checks as those have done in the blocks below.

Comment on lines +93 to +97
if type(self.model).__name__ == "ImageGPTForImageClassification":
# Temporary workaround
# Check that the model can still do a forward pass successfully (every parameter should be resized)
# Input ids should be clamped to the maximum size of the vocabulary
model_inputs["pixel_values"].clamp_(max=self.model.config.vocab_size - 15 - 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageGPTModelTester has vocab_size=99, but ImageGPTFeatureExtractor doesn't use vocab_size, and will produce values between 0 and 512.

This operation is copied from ImageGPTModelTest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is very much something we should avoid as much as possible.

Why does the feature_extractor does not produce valid pixel_values (clamping himself if possible) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look the details in ImageGPT. It seems to me it is special: unlike other vision models' feature extractors which output pixel values, ImageGPTFeatureExtractor outputs something called color clusters which should be treated like tokens as in NLP models.

In fact, the following message is in ImageGPTModel

        if "pixel_values" in kwargs:
            warnings.warn(
                "The `pixel_values` argument is deprecated and will be removed in a future version, use `input_ids` instead.",
                FutureWarning,
            )

Copy link
Collaborator Author

@ydshieh ydshieh Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can move this logic (about clamp_) to be inside ImageGPTFeatureExtractor itself. But this is not done for any vision model so far, and @NielsRogge 's opinion is important for this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@@ -70,7 +70,7 @@ def __init__(
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
max_position_embeddings=1024,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageGPTFeatureExtractor will create sequences of length 32 * 32 = 1024.
Using max_position_embeddings=512 will give index out of range when dealing with position embeddings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to modify the config in such a way (if you don't want to).

You can use get_pipeline_config(self) method that will enable the pipeline tests to use a different config than what is used for the model testing, it's useful specifically for changing stuff like max_pos and vocab_size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to know about this 💯 Thank you!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 21, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh removed the request for review from Narsil April 21, 2022 08:15
Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR !

I think all your changes make sense but are not necessarily the best possible change.
I'll try to come up with different (hopefully better) changes to support this.

Comment on lines +83 to +89
# No need to perform padding if all lengths are equal
all_lengths = [item[key].shape[1] for item in items]
if len(set(all_lengths)) == 1:
tensor = torch.stack([item[key][0] for item in items], dim=0)
return tensor

max_length = max(all_lengths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make that change.

We can on the other hand add a meaningful padding value for that key (here 0 I guess).

The fact that None is passed is the bug here I think.

The problem with this change, is that you have about 0 information what kind of tensors are sent here, so shape[1] is most likely going to trigger a bug somewhere somehow (maybe shape[1] doesn't exist, or shape[2] exists and prevent stacking for instance).

Comment on lines +93 to +97
if type(self.model).__name__ == "ImageGPTForImageClassification":
# Temporary workaround
# Check that the model can still do a forward pass successfully (every parameter should be resized)
# Input ids should be clamped to the maximum size of the vocabulary
model_inputs["pixel_values"].clamp_(max=self.model.config.vocab_size - 15 - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is very much something we should avoid as much as possible.

Why does the feature_extractor does not produce valid pixel_values (clamping himself if possible) ?

@@ -70,7 +70,7 @@ def __init__(
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
max_position_embeddings=1024,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to modify the config in such a way (if you don't want to).

You can use get_pipeline_config(self) method that will enable the pipeline tests to use a different config than what is used for the model testing, it's useful specifically for changing stuff like max_pos and vocab_size.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 21, 2022

Closed - in favor of #16871

@ydshieh ydshieh deleted the add_imagegpt_to_mappings branch May 5, 2022 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants