Add `keep_in_fp32_modules` support #20683

younesbelkada · 2022-12-08T15:40:14Z

What does this PR do?

This PR partially addresses #20287 - although half-precision and int8 conversion work extremely well for most of the models, for some architectures (e.g. T5) the casting leads in a drastic performance degradation.

This can be fixed by manually force-casting some modules in float32. For FLAN-T5, @larsmennen and @navjotts have found out that keeping only these weights in fp32 enables to run largest models in fp16 or int8 with no performance degradation.

This PR introduces a new utils in from_pretrained method, termed as keep_in_fp32_modules that partially addresses this issue.

How this util works? For T5:

from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.float16, keep_in_fp32_modules=["wo"])
print(model.decoder.block[0].layer[2].DenseReluDense.wo.weight.dtype)
>>> torch.float32

When using keep_in_fp32_modules , low_cpu_mem_usage needs to be force-set to True. This is because if low_cpu_mem_usage=False, it is the function from pytorch _load_from_state_dict that is called under the hood on each sub-module. This function calls copy_ from Pytorch which seems to keep the tensor in its native dtype regardless the dtype of the input module

import torch

param = torch.Tensor([0.1, 0.2, 0.3]).to(torch.float16)
to_copy_param = torch.Tensor([0.2, 0.1, 0.3]).to(torch.float32)

param.copy_(to_copy_param)
print(param.dtype)
>>> torch.float16

Keeping this as a draft for now as this util needs to be manually patched with fixes such as #20287 (comment) , otherwise users will encounter issues about incompatible dtype between input and weights

cc @sgugger

younesbelkada · 2022-12-08T15:43:13Z

What about adding hooks on each converted module, that will take care of converting the input / output to the correct dtype ?

sgugger

Some initial comments:

src/transformers/modeling_utils.py

sgugger · 2022-12-08T15:44:27Z

src/transformers/modeling_utils.py

+        if keep_in_fp32_modules is not None and not low_cpu_mem_usage:
+            # Force `low_cpu_mem_usage` to be set to `True` - check the PR:
+            logger.warning(
+                "The argument `keep_in_fp32_modules` is used, force-enabling `low_cpu_mem_usage` to load the model"
+            )
+            low_cpu_mem_usage = True


Shouldn't be force-set here.

proposed something in 115c0d0

HuggingFaceDocBuilderDev · 2022-12-08T15:59:33Z

The documentation is not available anymore as the PR was closed or merged.

- make tests `slow` - fix logic

src/transformers/modeling_utils.py

younesbelkada · 2022-12-08T22:37:24Z

As suggested in #20287 / model loaded in bfloat16 should keep their weights in bfloat16 and not cast them in fp32. This is addressed in e3498da

src/transformers/modeling_utils.py

sgugger · 2022-12-09T13:42:17Z

src/transformers/modeling_utils.py

+                    logger.warning(
+                        " `_keep_in_fp32_modules` is not set to `None` and you don't have `accelerate` installed",
+                        " it is recommended to have `accelerate` installed in this case `pip install accelerate`.",
+                    )


The warning should only be trigerred when torch_dtype == torch.float16.

Should be fixed in 8014c34

sgugger · 2022-12-09T13:43:45Z

src/transformers/modeling_utils.py

+        keep_in_fp32_modules = model._keep_in_fp32_modules
+        if keep_in_fp32_modules is not None:
+            low_cpu_mem_usage = True


Shouldn't this use use_keep_in_fp32_modules here? Also should go before so the test at line 2307 can be simplified.

I simplified the tests in 966cc06 but I think that we still need keep_in_fp32_modules = model._keep_in_fp32_modules as it is used later on in line 2342

I think you are right: 0f75387

Actually it seems that it's more tricky than that, putting it on top will result in some failing tests should be fixed in 243e6b5

sgugger · 2022-12-09T13:44:33Z

src/transformers/modeling_utils.py

+                # upcast in fp32 if any
+                target_dtype = dtype
+                if keep_in_fp32_modules is not None and any(
+                    module_to_keep_in_fp32 in key for module_to_keep_in_fp32 in keep_in_fp32_modules


should also add a test of dtype being float16 here.

added in 8014c34

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

src/transformers/modeling_utils.py

sgugger · 2022-12-09T17:05:50Z

src/transformers/modeling_utils.py

@@ -2299,6 +2323,10 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
        with ContextManagers(init_contexts):
            model = cls(config, *model_args, **model_kwargs)

+        if use_keep_in_fp32_modules:
+            low_cpu_mem_usage = True
+        keep_in_fp32_modules = model._keep_in_fp32_modules


Let's set it to [] here if it's not None, so that we don't have to check again layer on.

This should be addressed in cb89c42

sgugger · 2022-12-09T17:07:44Z

src/transformers/modeling_utils.py

+        elif keep_in_fp32_modules is not None and state_dict is not None:
+            for key in state_dict:
+                if any(module_to_keep_in_fp32 in key for module_to_keep_in_fp32 in keep_in_fp32_modules):
+                    state_dict[key] = state_dict[key].to(torch.float32)


This is not useful as with torch.load_state_dict, the weights are converted to the dtype inside the model. So it's the model dtype that you should fix here.

Also this removes the necessity for an Accelerate warning above, no?

Yes! Should be addressed in cb89c42

larsmennen · 2022-12-09T18:09:50Z

Thanks @younesbelkada and @sgugger !! Tested this locally; can confirm this works with patch 1&2 from #20287 (comment)

The only problem I encountered is that in:

transformers/src/transformers/modeling_utils.py

Line 2326 in 0f75387

model = cls(config, *model_args, **model_kwargs)

You get an error as keep_in_fp32_modules is an unexpected keyword to the underlying model class (locally i just added it quickly to test). Do you want to add this in so people can use it in their model class to determine where to apply patches like 1&2? Or alternatively don't pass it on and then people can just query the dtype.

younesbelkada · 2022-12-09T18:14:12Z

Thanks so much @larsmennen for confirming that the tests pass! We should be close merging this 💪
I think that your failing test should be fixed with my latest commit ( cb89c42 ) but I am not sure, could you try again with the latest commit? 🙏

younesbelkada · 2022-12-09T18:16:38Z

src/transformers/modeling_utils.py

@@ -2070,6 +2070,10 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
        # Load model
        loading_info = None

+        # Keep in fp32 modules
+        keep_in_fp32_modules = None


@larsmennen the keyword has been added here if this is what you meant

- add `is_accelerate_available` - fixes pipleine tests that failed

larsmennen · 2022-12-09T21:00:40Z

hmm that doesn't fix it. I think you just need to pop the argument from model_kwargs, otherwise it gets passed to the underlying model (i'm assuming you don't want that? but cmiiw)

I.e. after

transformers/src/transformers/modeling_utils.py

Line 1981 in 7d47df2

commit_hash = kwargs.pop("_commit_hash", None)

if you add

keep_in_fp32_modules = kwargs.pop("keep_in_fp32_modules", None)

I tested w/ that modification on top of 7d47df2 and that works! Thanks for the quick action @younesbelkada ! 🙏

younesbelkada · 2022-12-09T23:06:18Z

@larsmennen how are you loading your model ? The description above is slightly misleading as initially the plan was to add a kwarg when loading the model as follows:

from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-small", torch_dtype=torch.float16, keep_in_fp32_modules=["wo"])

but now this is not needed, you should just load your model like:

from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-small", device_map="auto", load_in_8bit=True])

larsmennen · 2022-12-10T01:02:48Z

@younesbelkada ah i see! I was passing the kwarg yes, so that explains.

sgugger

Almost there, just one last comment and we should be good to merge! Thanks!

sgugger · 2022-12-12T14:04:21Z

src/transformers/modeling_utils.py

@@ -2276,11 +2290,14 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
                        )
                dtype_orig = cls._set_default_torch_dtype(torch_dtype)

+            # Check if `_keep_in_fp32_modules` is not None
+            use_keep_in_fp32_modules = cls._keep_in_fp32_modules is not None and is_accelerate_available()


This is also only relevant if torch_dtype==torch.float16 so maybe add it here?

This is the place to issue a warning I think if:

cls._keep_in_fp32_modules is not None

torch_dtype==torch.float16

is_accelerate_available() is not true
to tell the user they should install Accelerate to have good predictions from the model.

Agreed! Proposed your suggestions in 1d21843

src/transformers/modeling_utils.py

younesbelkada · 2022-12-12T18:26:44Z

@larsmennen this PR will be merged as soon as all the tests will be green !
Would you mind opening a PR addressing your suggestions (patch 1 & 2 from the discussion at #20287 )?

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…ransformers into add-fp32-modules

younesbelkada · 2022-12-13T10:37:23Z

All slow tests from T5 (and BLOOM just in case we didn't break anything else) pass 🟢
Merging once the CI tests are green

* add `keep_in_fp32_modules` support * pass it as class attribute * few modifs - make tests `slow` - fix logic * better logic * fix failing test * `bfloat16` support * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * simplify tests * simplify tests * fix test * modify message * more checks * fix failing tests * add more conditions - add `is_accelerate_available` - fixes pipleine tests that failed * add suggestions * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix failing `bnb` test * add last safety checker Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Oxi84 · 2023-04-05T23:17:55Z

I tried this one, latest version of transformers (27.4), cuda 10.2 and I get this error:

model1a_CPU = T5ForConditionalGeneration.from_pretrained(model_path, low_cpu_mem_usage=True,torch_dtype=torch.float16, keep_in_fp32_modules=["wo"]).to("cuda") TypeError: __init__() got an unexpected keyword argument 'keep_in_fp32_modules'

sgugger · 2023-04-06T01:24:03Z

keep_in_fp32_modules is not an argument you can pass to from_pretrained, this is all done internally.

younesbelkada · 2023-04-06T08:07:18Z

You need to do somthing like:

from transformers import T5ForConditionalGeneration

T5ForConditionalGeneration._keep_in_fp32_modules = ["wo"]

# your code here

sgugger · 2023-04-06T11:51:45Z

Except this is already done for T5 ;-)

add keep_in_fp32_modules support

15fd00a

younesbelkada requested a review from sgugger December 8, 2022 15:41

sgugger reviewed Dec 8, 2022

View reviewed changes

younesbelkada added 3 commits December 8, 2022 16:06

pass it as class attribute

2bd00b3

few modifs

115c0d0

- make tests `slow` - fix logic

better logic

a62a594

younesbelkada commented Dec 8, 2022

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

younesbelkada requested a review from sgugger December 8, 2022 16:42

younesbelkada added 2 commits December 8, 2022 16:51

fix failing test

9b0688f

bfloat16 support

e3498da

younesbelkada mentioned this pull request Dec 9, 2022

Flan-T5-XXL generates non-sensical text when load_in_8bit=True #20287

Closed

4 tasks

sgugger reviewed Dec 9, 2022

View reviewed changes

younesbelkada and others added 5 commits December 9, 2022 15:35

Update src/transformers/modeling_utils.py

c688e34

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix

8014c34

simplify tests

966cc06

simplify tests

0f75387

fix test

243e6b5

sgugger reviewed Dec 9, 2022

View reviewed changes

younesbelkada added 2 commits December 9, 2022 17:27

modify message

1e80f14

more checks

73743b6

fix failing tests

cb89c42

younesbelkada commented Dec 9, 2022

View reviewed changes

add more conditions

7d47df2

- add `is_accelerate_available` - fixes pipleine tests that failed

younesbelkada marked this pull request as ready for review December 12, 2022 09:41

younesbelkada requested a review from sgugger December 12, 2022 09:41

sgugger approved these changes Dec 12, 2022

View reviewed changes

add suggestions

1d21843

sgugger reviewed Dec 12, 2022

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

younesbelkada and others added 4 commits December 12, 2022 19:26

Update src/transformers/modeling_utils.py

50524ad

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix failing bnb test

986730b

Merge branch 'add-fp32-modules' of https://github.com/younesbelkada/t…

ef56114

…ransformers into add-fp32-modules

add last safety checker

703c7f9

younesbelkada merged commit 1af4bee into huggingface:main Dec 13, 2022

larsmennen mentioned this pull request Dec 14, 2022

Patch for FlanT5-XXL 8bit support #20760

Merged

4 tasks

younesbelkada mentioned this pull request Jan 24, 2023

[t5] Fix T5 inference in float16 + bnb error #21281

Merged

ryan-caesar-ramos mentioned this pull request Feb 6, 2023

[bnb] We should be able to run 8-bit models on CPU & GPU #20281

Closed

younesbelkada mentioned this pull request Feb 10, 2023

[Blip2] Add int8 support for blip2-flan-t5-xxl #21574

Merged

younesbelkada deleted the add-fp32-modules branch April 6, 2023 08:06

younesbelkada mentioned this pull request May 21, 2023

junk results for int8 for Flan-xl/xxl #22568

Closed

4 tasks

gregor-ge mentioned this pull request Jan 11, 2024

Skip some weights for load_in_8bit and keep them as fp16/32? #28435

Closed

Rohan138 mentioned this pull request Oct 19, 2024

T5 models fail when loaded with torch_dtype=torch.half #34264

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `keep_in_fp32_modules` support #20683

Add `keep_in_fp32_modules` support #20683

younesbelkada commented Dec 8, 2022 •

edited

Loading

younesbelkada commented Dec 8, 2022

sgugger left a comment

sgugger Dec 8, 2022

younesbelkada Dec 8, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 8, 2022 •

edited

Loading

younesbelkada commented Dec 8, 2022

sgugger Dec 9, 2022

younesbelkada Dec 9, 2022

sgugger Dec 9, 2022

younesbelkada Dec 9, 2022

younesbelkada Dec 9, 2022

younesbelkada Dec 9, 2022

sgugger Dec 9, 2022

younesbelkada Dec 9, 2022

sgugger Dec 9, 2022

younesbelkada Dec 9, 2022

sgugger Dec 9, 2022

younesbelkada Dec 9, 2022

larsmennen commented Dec 9, 2022

younesbelkada commented Dec 9, 2022 •

edited

Loading

younesbelkada Dec 9, 2022

larsmennen commented Dec 9, 2022

younesbelkada commented Dec 9, 2022

larsmennen commented Dec 10, 2022

sgugger left a comment

sgugger Dec 12, 2022

younesbelkada Dec 12, 2022

younesbelkada commented Dec 12, 2022

younesbelkada commented Dec 13, 2022

Oxi84 commented Apr 5, 2023

sgugger commented Apr 6, 2023

younesbelkada commented Apr 6, 2023

sgugger commented Apr 6, 2023

Add keep_in_fp32_modules support #20683

Add keep_in_fp32_modules support #20683

Conversation

younesbelkada commented Dec 8, 2022 • edited Loading

What does this PR do?

younesbelkada commented Dec 8, 2022

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 8, 2022 • edited Loading

younesbelkada commented Dec 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larsmennen commented Dec 9, 2022

younesbelkada commented Dec 9, 2022 • edited Loading

Choose a reason for hiding this comment

larsmennen commented Dec 9, 2022

younesbelkada commented Dec 9, 2022

larsmennen commented Dec 10, 2022

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Dec 12, 2022

younesbelkada commented Dec 13, 2022

Oxi84 commented Apr 5, 2023

sgugger commented Apr 6, 2023

younesbelkada commented Apr 6, 2023

sgugger commented Apr 6, 2023

Add `keep_in_fp32_modules` support #20683

Add `keep_in_fp32_modules` support #20683

younesbelkada commented Dec 8, 2022 •

edited

Loading

younesbelkada Dec 8, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 8, 2022 •

edited

Loading

younesbelkada commented Dec 9, 2022 •

edited

Loading