-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Generate
] Fix gradient_checkpointing
and use_cache
bug for generate-compatible models
#21737
[Generate
] Fix gradient_checkpointing
and use_cache
bug for generate-compatible models
#21737
Comments
@younesbelkada I am a little confused on where the list for generate-compatible models is located. I'd like to pick up this issue if I can find it! |
@younesbelkada Looks like it will be essentially the same fix across the other models too. Do you want me to pull that fix into a utility function once merged? use_cache = should_use_cache(logger, use_cache, self.gradient_checkpointing, self.training)
presents = () if use_cache else None and likely in modeling_utils.py - def should_use_cache(logger: Logger, use_cache: bool, gradient_checkpointing: bool, training: bool) -> bool:
if use_cache:
if gradient_checkpointing and training:
logger.warning(
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
)
else:
return True
return False Was looking into making the fix and realized there would be some repetition so thought I'd ask |
Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I'd give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in In case you're curious about this position, we have a blog post about why we do it here 🤗 |
@mollerup23 the list and the instructions are updated, in case you're interested in contributing :D |
Would like to take GPT-2! |
I want to work on GPT-J! |
I would like to work on Blenderbot |
Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer |
Hi, I am a newbie to open source and would like to contribute. @younesbelkada can I contribute to this issue? |
Hey @saswatmeher 1- Fork this repository Let us know if you have more questions! |
I am happy to pick up other models too. Can I work on Bart, Bert, BigBird. |
Hello 👋, I would like to contribute and work on T5. Let me know, Thanks! |
@younesbelkada Can I claim TimeSeriesTransformer? |
hi @mollerup23 |
Hey @krypticmouse! |
Hi @younesbelkada, Thanks for asking. My PR got merged long ago. |
Thanks for the heads up, just updated the table, the only model left seems to be TimeSeries Transformer then, thank you all for the great contribution! |
Hey @younesbelkada, may I work on the TimeSeries Transformer? |
@annahung31 I believe @mollerup23 is working on it :) @mollerup23, can you confirm? |
yes @gante @annahung31 , the PR is here: #22272 |
Feature request
When using a model that uses
gradient_checkpointing
and if a user wants to callgenerate
withuse_cache
, it leads some models to bugs, such as the one described in #21733The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR
How to participate
if
block to the line above the... if use_cache else None
, in the same.forward()
function. Please note that some models may have more than one instance of this block!make fixup
in your shell -- also runmake fix-copies
if it requests you to do so)That's it! With each change, you'll be making
transformers
a little bit better for all of us 💛Models to fix:
GPTNeo
] Fix gradient checkpointing bug #21733The text was updated successfully, but these errors were encountered: