Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Generate] Fix gradient_checkpointing and use_cache bug for generate-compatible models #21737

Closed
42 tasks done
younesbelkada opened this issue Feb 22, 2023 · 27 comments · Fixed by #21772, #21833, #21956 or #22272
Closed
42 tasks done
Assignees

Comments

@younesbelkada
Copy link
Contributor

younesbelkada commented Feb 22, 2023

Feature request

When using a model that uses gradient_checkpointing and if a user wants to call generate with use_cache, it leads some models to bugs, such as the one described in #21733

The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR

How to participate

  1. If it is your first time here, have a quick look at our contribution guidelines 🤗
  2. Pick a model from the list below. Check in the comments here if it hasn't been claimed yet.
  3. Claim your models in the comments (e.g. "I want to work on GPT2")
  4. Replicate the changes of this PR to your model of choice. In other words, move the if block to the line above the ... if use_cache else None, in the same .forward() function. Please note that some models may have more than one instance of this block!
  5. Make sure you've run our automated code formatting tool (i.e. run make fixup in your shell -- also run make fix-copies if it requests you to do so)
  6. Open a PR. Tag @younesbelkada or @gante (one of us is enough)

That's it! With each change, you'll be making transformers a little bit better for all of us 💛

Models to fix:

@pmollerus23
Copy link
Contributor

@younesbelkada I am a little confused on where the list for generate-compatible models is located. I'd like to pick up this issue if I can find it!

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Feb 22, 2023

Hello @mollerup23
Thanks for your interest, we will update the list with @gante once #21733 gets merged !

@connor-henderson
Copy link
Contributor

connor-henderson commented Feb 22, 2023

@younesbelkada Looks like it will be essentially the same fix across the other models too. Do you want me to pull that fix into a utility function once merged?
Just for illustration, something like -

use_cache = should_use_cache(logger, use_cache, self.gradient_checkpointing, self.training)
presents = () if use_cache else None

and likely in modeling_utils.py -

def should_use_cache(logger: Logger, use_cache: bool, gradient_checkpointing: bool, training: bool) -> bool:
    if use_cache:
        if gradient_checkpointing and training:
            logger.warning(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
        else:
            return True
    return False

Was looking into making the fix and realized there would be some repetition so thought I'd ask

@gante
Copy link
Member

gante commented Feb 23, 2023

Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I'd give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in forward(), and we prioritize clear code (= avoid abstractions) in the modeling code itself.

In case you're curious about this position, we have a blog post about why we do it here 🤗

@gante
Copy link
Member

gante commented Feb 23, 2023

@mollerup23 the list and the instructions are updated, in case you're interested in contributing :D

@yhl48
Copy link
Contributor

yhl48 commented Feb 24, 2023

Would like to take GPT-2!

@krypticmouse
Copy link
Contributor

I want to work on GPT-J!

@Batese2001
Copy link
Contributor

I would like to work on Blenderbot

@KMFODA
Copy link
Contributor

KMFODA commented Feb 27, 2023

Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer

@younesbelkada
Copy link
Contributor Author

Thanks a mile @KMFODA ! 💯
Feel free to take those, and tag me or @gante whenever you feel ready!

@saswatmeher
Copy link
Contributor

Hi, I am a newbie to open source and would like to contribute. @younesbelkada can I contribute to this issue?

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Feb 28, 2023

Hey @saswatmeher
Of course yes!!
You can pick up a model that has not been taken yet, for example BioGpt and do the following:

1- Fork this repository
2- Clone your fork locally and create a new branch git checkout -b fix-bio-gpt-issue
3- Modify the file src/transformers/models/biogpt/modeling_biogpt.py the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can check Files Changed on the PR, on the right top of the Pull Request page)
4- Apply these changes and push the changes on your branch
5- Finally open a Pull Request between fix-bio-gpt-issue and main branch of transformers (+ tag us, myself + @gante ) and we should be good to go!

Let us know if you have more questions!

@saswatmeher
Copy link
Contributor

saswatmeher commented Mar 1, 2023

I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.

@nipunjindal
Copy link
Contributor

nipunjindal commented Mar 8, 2023

Hello 👋, I would like to contribute and work on T5. Let me know, Thanks!
PR for the suggested changes.

@pmollerus23
Copy link
Contributor

@younesbelkada Can I claim TimeSeriesTransformer?

@younesbelkada
Copy link
Contributor Author

hi @mollerup23
Of course yes! Please feel free to take it!

@younesbelkada
Copy link
Contributor Author

Hey @krypticmouse!
Do you need any help for making the fix on GPT-j?

@krypticmouse
Copy link
Contributor

Hi @younesbelkada, Thanks for asking. My PR got merged long ago.

@younesbelkada
Copy link
Contributor Author

Thanks for the heads up, just updated the table, the only model left seems to be TimeSeries Transformer then, thank you all for the great contribution!

@annahung31
Copy link

Hey @younesbelkada, may I work on the TimeSeries Transformer?

@gante
Copy link
Member

gante commented Mar 21, 2023

@annahung31 I believe @mollerup23 is working on it :) @mollerup23, can you confirm?

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Mar 21, 2023

yes @gante @annahung31 , the PR is here: #22272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment