Inconsistent settings for FSDP Precision #17506

awaelchli · 2023-04-27T23:52:36Z

Bug description

The FSDPPrecision class controls the dtype of the model parameters, autocasting and grad scaler.
It has two possible inputs: precision="16-mixed" or "bf16-mixed":
https://github.com/Lightning-AI/lightning/blob/6464650a3c91a75d4a7c4009d08a68a84037a9a1/src/lightning/fabric/plugins/precision/fsdp.py#L32-L34

In all other precision plugins, "mixed" refers to mixed precision training with model weights in float32 and autocasting inputs and operations to lower precision. However, the FSDPPrecision plugin sets the dtype of the model parameters to float16/bfloat16 regardless. This means we are actually running "16-true".

https://github.com/Lightning-AI/lightning/blob/6464650a3c91a75d4a7c4009d08a68a84037a9a1/src/lightning/fabric/plugins/precision/fsdp.py#L50-L60

Proposal:

For the mixed precision settings, set param_type=torch.float32. For the current case, introduce the precision settings "16-true" and "bf16-true".

cc @awaelchli @carmocca

The text was updated successfully, but these errors were encountered:

leng-yue · 2023-05-20T19:28:40Z

This bug causes training to fail when performing certain fp32 operations, such as bicubic interpolation. I will submit a pull request later tonight to address this issue.

awaelchli added bug Something isn't working needs triage Waiting to be triaged by maintainers strategy: fsdp Fully Sharded Data Parallel labels Apr 27, 2023

github-actions bot added the ver: 1.9.x label Apr 27, 2023

awaelchli removed needs triage Waiting to be triaged by maintainers ver: 1.9.x labels Apr 27, 2023

leng-yue mentioned this issue May 21, 2023

Fix Mix Precision settings for FSDP Plugins #17670

Merged

10 tasks

Borda closed this as completed in #17670 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent settings for FSDP Precision #17506

Inconsistent settings for FSDP Precision #17506

awaelchli commented Apr 27, 2023 •

edited

Loading

leng-yue commented May 20, 2023

Inconsistent settings for FSDP Precision #17506

Inconsistent settings for FSDP Precision #17506

Comments

awaelchli commented Apr 27, 2023 • edited Loading

Bug description

leng-yue commented May 20, 2023

awaelchli commented Apr 27, 2023 •

edited

Loading