Cache results of is_torch_tpu_available() #18777

comaniac · 2022-08-26T17:22:07Z

What does this PR do?

xm.xla_device() (called by is_torch_tpu_available()) hangs when calling multiple times but no XLA devices are available, and this results in Trainer hanging. Since currently torch_xla will be used as long as it is installed in the current active Python environment, I encountered this issue even when I only want to run the Trainer with PyTorch on GPU.

The detail reason behind torch_xla is still under investigation (see pytorch/xla#3939).

To workaround this issue, this PR adds lru_cache to is_torch_tpu_available(), so that xm.xla_device() is guaranteed to be called only once when no XLA device is available.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@muellerzr @sgugger

HuggingFaceDocBuilderDev · 2022-08-26T17:35:45Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/utils/import_utils.py

comaniac · 2022-08-26T19:11:47Z

Not sure the reason of CI failure. It seems not relevant to this PR.

LysandreJik

This sounds fair to me, but I'll let @sgugger, mastermind of the Trainer, review when he's back from leave at the end of the week!

comaniac · 2022-08-31T16:03:10Z

@LysandreJik thanks for the review and sure we could wait for @sgugger.
Meanwhile, do I need to do anything to fix the CI failure?

sgugger

Thanks a lot for your PR! Tests failures seem unrelated, I have tried re-runing them. Could you you rebase on main if the failure persist?

sgugger · 2022-09-01T15:45:30Z

Thanks for bearing with us. The test failures are spurious and unrelated, so we can merge this.

* Cache results of is_torch_tpu_available() * Update src/transformers/utils/import_utils.py * Update src/transformers/utils/import_utils.py

comaniac mentioned this pull request Aug 26, 2022

[Bug] torch_xla._XLAC._xla_get_devices() hangs at the second call pytorch/xla#3939

Open

comaniac commented Aug 26, 2022

View reviewed changes

src/transformers/utils/import_utils.py Show resolved Hide resolved

comaniac commented Aug 26, 2022

View reviewed changes

src/transformers/utils/import_utils.py Outdated Show resolved Hide resolved

LysandreJik approved these changes Aug 30, 2022

View reviewed changes

sgugger approved these changes Aug 31, 2022

View reviewed changes

muellerzr mentioned this pull request Sep 1, 2022

Cache torch_tpu check huggingface/accelerate#670

Merged

comaniac added 3 commits September 1, 2022 15:12

Cache results of is_torch_tpu_available()

68e7c5a

Update src/transformers/utils/import_utils.py

d24b257

Update src/transformers/utils/import_utils.py

e17839d

comaniac force-pushed the patch-1 branch from 5a99778 to e17839d Compare September 1, 2022 15:12

sgugger merged commit 1c381f3 into huggingface:main Sep 1, 2022

comaniac deleted the patch-1 branch September 1, 2022 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache results of is_torch_tpu_available() #18777

Cache results of is_torch_tpu_available() #18777

comaniac commented Aug 26, 2022

HuggingFaceDocBuilderDev commented Aug 26, 2022 •

edited

Loading

comaniac commented Aug 26, 2022

LysandreJik left a comment

comaniac commented Aug 31, 2022

sgugger left a comment •

edited

Loading

sgugger commented Sep 1, 2022

Cache results of is_torch_tpu_available() #18777

Cache results of is_torch_tpu_available() #18777

Conversation

comaniac commented Aug 26, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 26, 2022 • edited Loading

comaniac commented Aug 26, 2022

LysandreJik left a comment

Choose a reason for hiding this comment

comaniac commented Aug 31, 2022

sgugger left a comment • edited Loading

Choose a reason for hiding this comment

sgugger commented Sep 1, 2022

HuggingFaceDocBuilderDev commented Aug 26, 2022 •

edited

Loading

sgugger left a comment •

edited

Loading