Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF: use the correct config with (...)EncoderDecoder models #18097

Merged
merged 2 commits into from
Jul 22, 2022

Conversation

gante
Copy link
Member

@gante gante commented Jul 11, 2022

What does this PR do?

Fixes #18071

Modifies unpack_inputs to ignore the config file for (...)EncoderDecoder models, mimicking the behavior in PT. If we don't ignore it, then unset options will get set with the config's default (False for most of them), causing the inner models to ignore their own config files.

⚠️ I've added a corresponding test for the EncoderDecoder models. I then noticed that other (...)EncoderFecoder tests have copy/pasted their own EncoderDecoderMixin, so I've left the other classes for a follow-up PR with the following question: should a common EncoderDecoderMixin be defined and shared across (...)EncoderDecoder tests, or should I add a similar test to all other classes individually?

@gante gante changed the title TF: use the right config TF: use the correct config with encoder decoder models Jul 11, 2022
@gante gante changed the title TF: use the correct config with encoder decoder models TF: use the correct config with (...)EncoderDecoder models Jul 11, 2022
@gante gante requested review from sgugger and ydshieh July 11, 2022 15:31
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 11, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! As for your question, it depends if they are all exact copies and can easily be refactored in one mixin or not.

@gante
Copy link
Member Author

gante commented Jul 11, 2022

I believe they are -- going to give it a go afterwards if @ydshieh also agrees :)

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 11, 2022

I have limited connection at this moment in the mountain, so feel free to merge if you prefer. Regarding the common mixin, good for me. I see there are a few little things to address, like the input names (input_ids, pixel values etc).
Would be nice if you do this refactorization after merging my PR about the PT/TF equivalence tests, or incoperate the change in it 🙏

Thank you for the fix, @gante

@gante
Copy link
Member Author

gante commented Jul 21, 2022

@ydshieh can I have a review plz 🙏

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, @gante . Would you mind to rebase on main, and run test_pt_tf_model_equivalence for TF (Vision)EncoderDecoderModel? Thank you.

unpacked_inputs = input_processing(func, self.config, main_input, **fn_args_and_kwargs)

# Encoder Decoder models delegate the application of the configuration options to their inner models.
if "encoder_decoder" in str(self).lower():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, no need to feel strong:

I would personally prefer to use self.__name__ instead of str and test against EncoderDecoder. If I understand correctly, str gives the full path, which includes the module name like encoder_decoder or vision_encoder_decoder, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying self.__name__ on this line, and it seems like it isn't defined in some cases -- e.g. in tests/models/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py::TFViT2GPT2EncoderDecoderModelTest::test_encoder_decoder_model, it throws *** AttributeError: 'TFVisionEncoderDecoderModel' object has no attribute '__name__'

str(self) here includes the full import path for the object, like you wrote, which contains the class name -- '<transformers.models.vision_encoder_decoder.modeling_tf_vision_encoder_decoder.TFVisionEncoderDecoderModel object at 0x7fc7783b8a60>'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can update in the future, but since it is working for now I'm going with it :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the info.

if use_cache:
past_key_values = decoder_outputs[1]
# The starting index of the remaining elements in `decoder_outputs`
start_index = sum([1 if x is not None else 0 for x in (loss, logits, past_key_values)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement / cleanup!

decoder_attention_mask = decoder_attention_mask[:, :-1]
encoder_model, decoder_model = self.get_encoder_decoder_model(config, decoder_config)
enc_dec_model = EncoderDecoderModel(encoder=encoder_model, decoder=decoder_model)
enc_dec_model.config.output_attentions = True # model config -> won't work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition!

@gante
Copy link
Member Author

gante commented Jul 21, 2022

@ydshieh rebased with main and reran tests -- all working 👍

@gante gante merged commit 1fc4b2a into huggingface:main Jul 22, 2022
@gante gante deleted the unpack_inputs_composite branch July 22, 2022 12:31
oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Composite models (encoder-decoder) behave differently across PyTorch and TensorFlow
4 participants