[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models #21737

younesbelkada · 2023-02-22T12:45:18Z

Feature request

When using a model that uses gradient_checkpointing and if a user wants to call generate with use_cache, it leads some models to bugs, such as the one described in #21733

The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR

How to participate

If it is your first time here, have a quick look at our contribution guidelines 🤗
Pick a model from the list below. Check in the comments here if it hasn't been claimed yet.
Claim your models in the comments (e.g. "I want to work on GPT2")
Replicate the changes of this PR to your model of choice. In other words, move the if block to the line above the ... if use_cache else None, in the same .forward() function. Please note that some models may have more than one instance of this block!
Make sure you've run our automated code formatting tool (i.e. run make fixup in your shell -- also run make fix-copies if it requests you to do so)
Open a PR. Tag @younesbelkada or @gante (one of us is enough)

That's it! With each change, you'll be making transformers a little bit better for all of us 💛

Models to fix:

The text was updated successfully, but these errors were encountered:

pmollerus23 · 2023-02-22T16:04:23Z

@younesbelkada I am a little confused on where the list for generate-compatible models is located. I'd like to pick up this issue if I can find it!

younesbelkada · 2023-02-22T16:06:04Z

Hello @mollerup23
Thanks for your interest, we will update the list with @gante once #21733 gets merged !

connor-henderson · 2023-02-22T20:44:35Z

@younesbelkada Looks like it will be essentially the same fix across the other models too. Do you want me to pull that fix into a utility function once merged?
Just for illustration, something like -

use_cache = should_use_cache(logger, use_cache, self.gradient_checkpointing, self.training)
presents = () if use_cache else None

and likely in modeling_utils.py -

def should_use_cache(logger: Logger, use_cache: bool, gradient_checkpointing: bool, training: bool) -> bool:
    if use_cache:
        if gradient_checkpointing and training:
            logger.warning(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
        else:
            return True
    return False

Was looking into making the fix and realized there would be some repetition so thought I'd ask

gante · 2023-02-23T10:11:22Z

Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I'd give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in forward(), and we prioritize clear code (= avoid abstractions) in the modeling code itself.

In case you're curious about this position, we have a blog post about why we do it here 🤗

gante · 2023-02-23T10:49:12Z

@mollerup23 the list and the instructions are updated, in case you're interested in contributing :D

yhl48 · 2023-02-24T00:47:56Z

Would like to take GPT-2!

krypticmouse · 2023-02-24T16:19:46Z

I want to work on GPT-J!

Batese2001 · 2023-02-24T18:53:14Z

I would like to work on Blenderbot

KMFODA · 2023-02-27T10:49:32Z

Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer

younesbelkada · 2023-02-27T10:51:51Z

Thanks a mile @KMFODA ! 💯
Feel free to take those, and tag me or @gante whenever you feel ready!

saswatmeher · 2023-02-28T06:28:23Z

Hi, I am a newbie to open source and would like to contribute. @younesbelkada can I contribute to this issue?

younesbelkada · 2023-02-28T08:58:48Z

Hey @saswatmeher
Of course yes!!
You can pick up a model that has not been taken yet, for example BioGpt and do the following:

1- Fork this repository
2- Clone your fork locally and create a new branch git checkout -b fix-bio-gpt-issue
3- Modify the file src/transformers/models/biogpt/modeling_biogpt.py the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can check Files Changed on the PR, on the right top of the Pull Request page)
4- Apply these changes and push the changes on your branch
5- Finally open a Pull Request between fix-bio-gpt-issue and main branch of transformers (+ tag us, myself + @gante ) and we should be good to go!

Let us know if you have more questions!

saswatmeher · 2023-03-01T05:17:28Z

I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.

nipunjindal · 2023-03-08T19:03:13Z

Hello 👋, I would like to contribute and work on T5. Let me know, Thanks!
PR for the suggested changes.

pmollerus23 · 2023-03-11T01:02:25Z

@younesbelkada Can I claim TimeSeriesTransformer?

younesbelkada · 2023-03-12T07:49:29Z

hi @mollerup23
Of course yes! Please feel free to take it!

younesbelkada · 2023-03-13T17:16:41Z

Hey @krypticmouse!
Do you need any help for making the fix on GPT-j?

krypticmouse · 2023-03-13T17:42:23Z

Hi @younesbelkada, Thanks for asking. My PR got merged long ago.

younesbelkada · 2023-03-13T17:53:19Z

Thanks for the heads up, just updated the table, the only model left seems to be TimeSeries Transformer then, thank you all for the great contribution!

annahung31 · 2023-03-17T09:08:38Z

Hey @younesbelkada, may I work on the TimeSeries Transformer?

gante · 2023-03-21T12:27:31Z

@annahung31 I believe @mollerup23 is working on it :) @mollerup23, can you confirm?

younesbelkada · 2023-03-21T12:29:03Z

yes @gante @annahung31 , the PR is here: #22272

younesbelkada added the Good First Issue label Feb 22, 2023

gante assigned gante and younesbelkada Feb 23, 2023

gante mentioned this issue Feb 23, 2023

Generate - update cookie cutters to not initialize cache with training and gradient checkpointing #21759

Merged

This was referenced Feb 24, 2023

[GPT2] Fix gradient checkpointing bug #21772

Merged

[ProphetNet] Fix gradient checkpointing bug #21773

Closed

gante closed this as completed in #21772 Feb 24, 2023

gante reopened this Feb 24, 2023

krypticmouse mentioned this issue Feb 25, 2023

[GPTJ] Fix gradient checkpointing bug #21794

Merged

This was referenced Feb 27, 2023

Fix gradient checkpointing bug in gptneox #21815

Merged

Fix gradient checkpointing imagegpt #21816

Merged

KMFODA mentioned this issue Feb 27, 2023

Fix gradient checkpointing bug in git #21818

Merged

Batese2001 mentioned this issue Feb 27, 2023

Fixed gradient_checkpointing/use_cache bug in blenderbot #21833

Merged

5 tasks

This was referenced Feb 28, 2023

Fix gradient checkpointing bug LED #21840

Merged

Fix gradient checkpointing bug M2M 100 #21841

Merged

Fix gradient checkpointing bug marian #21842

Merged

saswatmeher mentioned this issue Feb 28, 2023

Fix gradient checkpointing bug BioGpt #21844

Merged

younesbelkada reopened this Mar 6, 2023

soma2000-lang mentioned this issue Mar 8, 2023

fixes the gradient checkpointing of whisper #22019

Merged

younesbelkada mentioned this issue Mar 8, 2023

Peft support huggingface/trl#145

Closed

nipunjindal mentioned this issue Mar 8, 2023

[21737][T5]: Fix gradient checkpoint bug #22036

Merged

5 tasks

This was referenced Mar 10, 2023

Fix gradient checkpointing bug in Speech2Text #22079

Merged

Fix gradient checkpointing bug in Speecht5 #22080

Merged

Fix gradient checkpointing bug in switch transformer #22081

Merged

younesbelkada mentioned this issue Mar 23, 2023

Fixed gradient checkpoint bug for TimeSeriesTransformer #22272

Merged

5 tasks

sgugger closed this as completed in #22272 Mar 23, 2023

qgallouedec mentioned this issue Jul 3, 2024

Drop use_cache=False if training_args.gradient_checkpointing huggingface/trl#1798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models #21737

[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models #21737

younesbelkada commented Feb 22, 2023 •

edited by gante

Loading

pmollerus23 commented Feb 22, 2023

younesbelkada commented Feb 22, 2023 •

edited

Loading

connor-henderson commented Feb 22, 2023 •

edited

Loading

gante commented Feb 23, 2023

gante commented Feb 23, 2023

yhl48 commented Feb 24, 2023

krypticmouse commented Feb 24, 2023

Batese2001 commented Feb 24, 2023

KMFODA commented Feb 27, 2023

younesbelkada commented Feb 27, 2023

saswatmeher commented Feb 28, 2023

younesbelkada commented Feb 28, 2023 •

edited

Loading

saswatmeher commented Mar 1, 2023 •

edited

Loading

nipunjindal commented Mar 8, 2023 •

edited

Loading

pmollerus23 commented Mar 11, 2023

younesbelkada commented Mar 12, 2023

younesbelkada commented Mar 13, 2023

krypticmouse commented Mar 13, 2023

younesbelkada commented Mar 13, 2023

annahung31 commented Mar 17, 2023

gante commented Mar 21, 2023

younesbelkada commented Mar 21, 2023 •

edited

Loading

[Generate] Fix gradient_checkpointing and use_cache bug for generate-compatible models #21737

[Generate] Fix gradient_checkpointing and use_cache bug for generate-compatible models #21737

Comments

younesbelkada commented Feb 22, 2023 • edited by gante Loading

Feature request

How to participate

Models to fix:

pmollerus23 commented Feb 22, 2023

younesbelkada commented Feb 22, 2023 • edited Loading

connor-henderson commented Feb 22, 2023 • edited Loading

gante commented Feb 23, 2023

gante commented Feb 23, 2023

yhl48 commented Feb 24, 2023

krypticmouse commented Feb 24, 2023

Batese2001 commented Feb 24, 2023

KMFODA commented Feb 27, 2023

younesbelkada commented Feb 27, 2023

saswatmeher commented Feb 28, 2023

younesbelkada commented Feb 28, 2023 • edited Loading

saswatmeher commented Mar 1, 2023 • edited Loading

nipunjindal commented Mar 8, 2023 • edited Loading

pmollerus23 commented Mar 11, 2023

younesbelkada commented Mar 12, 2023

younesbelkada commented Mar 13, 2023

krypticmouse commented Mar 13, 2023

younesbelkada commented Mar 13, 2023

annahung31 commented Mar 17, 2023

gante commented Mar 21, 2023

younesbelkada commented Mar 21, 2023 • edited Loading

[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models #21737

[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models #21737

younesbelkada commented Feb 22, 2023 •

edited by gante

Loading

younesbelkada commented Feb 22, 2023 •

edited

Loading

connor-henderson commented Feb 22, 2023 •

edited

Loading

younesbelkada commented Feb 28, 2023 •

edited

Loading

saswatmeher commented Mar 1, 2023 •

edited

Loading

nipunjindal commented Mar 8, 2023 •

edited

Loading

younesbelkada commented Mar 21, 2023 •

edited

Loading