Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

JingyaHuang · 2022-07-28T20:01:06Z

System Info

transformers version: 4.21.0
Platform: Linux-5.4.0-121-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.11.0+cu113 (True)

Who can help?

@pacman100 @sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Running the run_glue(optimum version) with the distributed launcher

python -m torch.distributed.run --nproc_per_node=2 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge --task_name MRPC --do_train --output_dir /tmp/deberta_res --fp16 --sharded_ddp simple --num_train_epochs 1

Error message:

Traceback (most recent call last):
  File "run_glue.py", line 610, in <module>
    main()
  File "run_glue.py", line 503, in main
    trainer = ORTTrainer(
  File "/workspace/optimum/onnxruntime/trainer.py", line 144, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 569, in __init__
    self.scaler = ShardedGradScaler()
UnboundLocalError: local variable 'ShardedGradScaler' referenced before assignment

Expected behavior

ShardedGradScaler was firstly imported as global variable

transformers/src/transformers/trainer.py

Line 190 in da503ea

from fairscale.optim.grad_scaler import ShardedGradScaler

Then it was imported as a local variable for fsdp with the same name

transformers/src/transformers/trainer.py

Line 568 in da503ea

from torch.distributed.fsdp.sharded_grad_scaler import ShardedGradScaler

And it won't fall back to the global ShardedGradScaler, even when the local one is not imported leading, to an UnboundLocalError.

P.S. However I don't have problem running the run_glue.py in transformers, the problem seems to occur when using classes inherited from Trainer.

Possible solution: use different name / both import locally

REF:
https://docs.python.org/3/faq/programming.html#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value
https://stackoverflow.com/questions/58750517/why-unboundlocalerror-occurs-when-importing-inside-function

The text was updated successfully, but these errors were encountered:

pacman100 · 2022-07-29T10:01:45Z

Hello @JingyaHuang, thank you for bringing this to the notice with detailed steps and possible solutions 🤗. Can you try the above draft PR and see if that fixes the issue?

JingyaHuang added the bug label Jul 28, 2022

pacman100 self-assigned this Jul 29, 2022

pacman100 mentioned this issue Jul 29, 2022

fix FSDP ShardedGradScaler #18358

Merged

5 tasks

pacman100 closed this as completed in #18358 Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

JingyaHuang commented Jul 28, 2022 •

edited

Loading

pacman100 commented Jul 29, 2022 •

edited

Loading

Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

Comments

JingyaHuang commented Jul 28, 2022 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

pacman100 commented Jul 29, 2022 • edited Loading

JingyaHuang commented Jul 28, 2022 •

edited

Loading

pacman100 commented Jul 29, 2022 •

edited

Loading