-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ImageGPT to mappings #16869
Add ImageGPT to mappings #16869
Conversation
# No need to perform padding if all lengths are equal | ||
all_lengths = [item[key].shape[1] for item in items] | ||
if len(set(all_lengths)) == 1: | ||
tensor = torch.stack([item[key][0] for item in items], dim=0) | ||
return tensor | ||
|
||
max_length = max(all_lengths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary to make image classification pipeline test (with ImageGPTForImageClassification
) pass.
More precisely, the follofwing line would fail with padding_value
being None
.
torch.zeros((batch_size, max_length), dtype=dtype) + padding_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make that change.
We can on the other hand add a meaningful padding value for that key (here 0
I guess).
The fact that None
is passed is the bug here I think.
The problem with this change, is that you have about 0 information what kind of tensors are sent here, so shape[1]
is most likely going to trigger a bug somewhere somehow (maybe shape[1] doesn't exist, or shape[2] exists and prevent stacking for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's true. I forgot to add some checks as those have done in the blocks below.
if type(self.model).__name__ == "ImageGPTForImageClassification": | ||
# Temporary workaround | ||
# Check that the model can still do a forward pass successfully (every parameter should be resized) | ||
# Input ids should be clamped to the maximum size of the vocabulary | ||
model_inputs["pixel_values"].clamp_(max=self.model.config.vocab_size - 15 - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageGPTModelTester
has vocab_size=99
, but ImageGPTFeatureExtractor
doesn't use vocab_size
, and will produce values between 0 and 512.
This operation is copied from ImageGPTModelTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is very much something we should avoid as much as possible.
Why does the feature_extractor
does not produce valid pixel_values
(clamping himself if possible) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look the details in ImageGPT
. It seems to me it is special: unlike other vision models' feature extractors which output pixel values, ImageGPTFeatureExtractor
outputs something called color clusters
which should be treated like tokens as in NLP models.
In fact, the following message is in ImageGPTModel
if "pixel_values" in kwargs:
warnings.warn(
"The `pixel_values` argument is deprecated and will be removed in a future version, use `input_ids` instead.",
FutureWarning,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can move this logic (about clamp_
) to be inside ImageGPTFeatureExtractor
itself. But this is not done for any vision model so far, and @NielsRogge 's opinion is important for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
@@ -70,7 +70,7 @@ def __init__( | |||
hidden_act="gelu", | |||
hidden_dropout_prob=0.1, | |||
attention_probs_dropout_prob=0.1, | |||
max_position_embeddings=512, | |||
max_position_embeddings=1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageGPTFeatureExtractor
will create sequences of length 32 * 32 = 1024.
Using max_position_embeddings=512
will give index out of range when dealing with position embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to modify the config in such a way (if you don't want to).
You can use get_pipeline_config(self)
method that will enable the pipeline tests to use a different config than what is used for the model testing, it's useful specifically for changing stuff like max_pos
and vocab_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to know about this 💯 Thank you!
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR !
I think all your changes make sense but are not necessarily the best possible change.
I'll try to come up with different (hopefully better) changes to support this.
# No need to perform padding if all lengths are equal | ||
all_lengths = [item[key].shape[1] for item in items] | ||
if len(set(all_lengths)) == 1: | ||
tensor = torch.stack([item[key][0] for item in items], dim=0) | ||
return tensor | ||
|
||
max_length = max(all_lengths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make that change.
We can on the other hand add a meaningful padding value for that key (here 0
I guess).
The fact that None
is passed is the bug here I think.
The problem with this change, is that you have about 0 information what kind of tensors are sent here, so shape[1]
is most likely going to trigger a bug somewhere somehow (maybe shape[1] doesn't exist, or shape[2] exists and prevent stacking for instance).
if type(self.model).__name__ == "ImageGPTForImageClassification": | ||
# Temporary workaround | ||
# Check that the model can still do a forward pass successfully (every parameter should be resized) | ||
# Input ids should be clamped to the maximum size of the vocabulary | ||
model_inputs["pixel_values"].clamp_(max=self.model.config.vocab_size - 15 - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is very much something we should avoid as much as possible.
Why does the feature_extractor
does not produce valid pixel_values
(clamping himself if possible) ?
@@ -70,7 +70,7 @@ def __init__( | |||
hidden_act="gelu", | |||
hidden_dropout_prob=0.1, | |||
attention_probs_dropout_prob=0.1, | |||
max_position_embeddings=512, | |||
max_position_embeddings=1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to modify the config in such a way (if you don't want to).
You can use get_pipeline_config(self)
method that will enable the pipeline tests to use a different config than what is used for the model testing, it's useful specifically for changing stuff like max_pos
and vocab_size
.
Closed - in favor of #16871 |
What does this PR do?
Similar to #16857, but only for
ImageGPT
, together with the necessary to make the (pipeline) tests pass.