Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use base version when comparing torch versions #16657

Merged
merged 11 commits into from
Mar 7, 2023
7 changes: 5 additions & 2 deletions src/lightning/fabric/utilities/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@
# 2. The inspection mode via `python -i`: https://stackoverflow.com/a/6879085/1162383
_IS_INTERACTIVE = hasattr(sys, "ps1") or bool(sys.flags.interactive)

_TORCH_GREATER_EQUAL_1_12 = compare_version("torch", operator.ge, "1.12.0")
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0")
# We use "base_version" for non-nightly builds as well, because some environments like NVIDIA's PyTorch dockers
# install PyTorch from source at a commit that doesn't align with the released version tag.
# See: https://github.com/Lightning-AI/lightning/issues/16644
_TORCH_GREATER_EQUAL_1_12 = compare_version("torch", operator.ge, "1.12.0", use_base_version=True)
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0", use_base_version=True)
Copy link
Contributor

@carmocca carmocca Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows a different bug:

If we use an api added in 1.13.0 (final release)
But the user has 1.13.0+a
Where 1.13.0+a is an earlier version that doesnt include this api
There will be an error

Copy link
Contributor

@carmocca carmocca Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suggest that we don't do this and we just recommend upgrading torch instead. Meaning we don't support old nightly or pre-release versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user in the linked issue is using a standard docker image from nvidia: nvcr.io/nvidia/pytorch:22.10-py3
This means we won't support any of these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess. I wonder why they use these PyTorch installations. One improvement we could do would be to warn the user about this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using these images for the sake of repeatability in my experiments and because all the packages are
working out of the box (no need to manage conda/pip requirements, just run docker run nvcr.io/nvidia/pytorch:22.10-py3 python my_script.py).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is related to officila Nvidia/PyTorch images, I would roll this change with use_base_version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment above the changed lines with an explanation for the issue with a reference to this PR

Copy link
Contributor

@carmocca carmocca Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli This introduced a failing workflow in master (build-NGC) exactly because of this issue. The NGC 1.13 image installs a 1.13 release (1.13.0a0+d0d6b1f) that doesn't include a feature included in the true 1.13 release:

https://github.com/Lightning-AI/lightning/actions/runs/4358626506/jobs/7619447910#step:3:1685

This will fail for anybody installing this specific image. I don't have any better suggestion than reverting this PR or skipping the workflow.

Copy link
Contributor Author

@awaelchli awaelchli Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This method will go away in future releases anyways

_TORCH_GREATER_EQUAL_2_0 = compare_version("torch", operator.ge, "2.0.0", use_base_version=True)
_TORCH_GREATER_EQUAL_2_1 = compare_version("torch", operator.ge, "2.1.0", use_base_version=True)
3 changes: 3 additions & 0 deletions src/lightning/pytorch/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed an issue where `DistributedSampler.set_epoch` wasn't getting called during `trainer.predict` ([#16785](https://github.com/Lightning-AI/lightning/pull/16785), [#16826](https://github.com/Lightning-AI/lightning/pull/16826))


- Fixed an issue with comparing torch versions when using a version of torch built from source ([#16657](https://github.com/Lightning-AI/lightning/pull/16657))


## [1.9.4] - 2023-03-01

### Added
Expand Down
3 changes: 1 addition & 2 deletions src/lightning/pytorch/utilities/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@
_PYTHON_GREATER_EQUAL_3_8_0 = (sys.version_info.major, sys.version_info.minor) >= (3, 8)
_PYTHON_GREATER_EQUAL_3_10_0 = (sys.version_info.major, sys.version_info.minor) >= (3, 10)
_PYTHON_GREATER_EQUAL_3_11_0 = (sys.version_info.major, sys.version_info.minor) >= (3, 11)
# duplicated from fabric because HPU is patching it below
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0")
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0", use_base_version=True)
_TORCHMETRICS_GREATER_EQUAL_0_9_1 = RequirementCache("torchmetrics>=0.9.1")
_TORCHMETRICS_GREATER_EQUAL_0_11 = RequirementCache("torchmetrics>=0.11.0") # using new API with task

Expand Down