Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global/local import with replicated name in the Trainer leading to UnboundLocalError #18350

Closed
1 of 4 tasks
JingyaHuang opened this issue Jul 28, 2022 · 1 comment · Fixed by #18358
Closed
1 of 4 tasks
Assignees
Labels

Comments

@JingyaHuang
Copy link
Contributor

JingyaHuang commented Jul 28, 2022

System Info

  • transformers version: 4.21.0
  • Platform: Linux-5.4.0-121-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.11.0+cu113 (True)

Who can help?

@pacman100 @sgugger

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Running the run_glue(optimum version) with the distributed launcher

python -m torch.distributed.run --nproc_per_node=2 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge --task_name MRPC --do_train --output_dir /tmp/deberta_res --fp16 --sharded_ddp simple --num_train_epochs 1

Error message:

Traceback (most recent call last):
  File "run_glue.py", line 610, in <module>
    main()
  File "run_glue.py", line 503, in main
    trainer = ORTTrainer(
  File "/workspace/optimum/onnxruntime/trainer.py", line 144, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 569, in __init__
    self.scaler = ShardedGradScaler()
UnboundLocalError: local variable 'ShardedGradScaler' referenced before assignment

Expected behavior

ShardedGradScaler was firstly imported as global variable

from fairscale.optim.grad_scaler import ShardedGradScaler

Then it was imported as a local variable for fsdp with the same name
from torch.distributed.fsdp.sharded_grad_scaler import ShardedGradScaler

And it won't fall back to the global ShardedGradScaler, even when the local one is not imported leading, to an UnboundLocalError.

P.S. However I don't have problem running the run_glue.py in transformers, the problem seems to occur when using classes inherited from Trainer.

Possible solution: use different name / both import locally

REF:
https://docs.python.org/3/faq/programming.html#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value
https://stackoverflow.com/questions/58750517/why-unboundlocalerror-occurs-when-importing-inside-function

@pacman100 pacman100 self-assigned this Jul 29, 2022
@pacman100
Copy link
Contributor

pacman100 commented Jul 29, 2022

Hello @JingyaHuang, thank you for bringing this to the notice with detailed steps and possible solutions 🤗. Can you try the above draft PR and see if that fixes the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants