-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use base version when comparing torch versions #16657
Conversation
_TORCH_GREATER_EQUAL_1_12 = compare_version("torch", operator.ge, "1.12.0") | ||
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0") | ||
_TORCH_GREATER_EQUAL_1_12 = compare_version("torch", operator.ge, "1.12.0", use_base_version=True) | ||
_TORCH_GREATER_EQUAL_1_13 = compare_version("torch", operator.ge, "1.13.0", use_base_version=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows a different bug:
If we use an api added in 1.13.0
(final release)
But the user has 1.13.0+a
Where 1.13.0+a
is an earlier version that doesnt include this api
There will be an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I suggest that we don't do this and we just recommend upgrading torch instead. Meaning we don't support old nightly or pre-release versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user in the linked issue is using a standard docker image from nvidia: nvcr.io/nvidia/pytorch:22.10-py3
This means we won't support any of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess. I wonder why they use these PyTorch installations. One improvement we could do would be to warn the user about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using these images for the sake of repeatability in my experiments and because all the packages are
working out of the box (no need to manage conda/pip requirements, just run docker run nvcr.io/nvidia/pytorch:22.10-py3 python my_script.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is related to officila Nvidia/PyTorch images, I would roll this change with use_base_version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment above the changed lines with an explanation for the issue with a reference to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli This introduced a failing workflow in master (build-NGC) exactly because of this issue. The NGC 1.13 image installs a 1.13 release (1.13.0a0+d0d6b1f
) that doesn't include a feature included in the true 1.13 release:
This will fail for anybody installing this specific image. I don't have any better suggestion than reverting this PR or skipping the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca Can I implement a fix that changes this condition here:
https://github.com/Lightning-AI/lightning/blob/accd2b9e61063ba3c683764043030545ed87c71f/src/lightning/pytorch/core/module.py#L1635-L1641
to use the base version check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. This method will go away in future releases anyways
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #16657 +/- ##
=========================================
- Coverage 82% 59% -22%
=========================================
Files 439 414 -25
Lines 31763 31463 -300
=========================================
- Hits 25968 18670 -7298
- Misses 5795 12793 +6998 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> (cherry picked from commit 63b9034)
What does this PR do?
Fixes #16644
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
cc @Borda @tchaton @carmocca @justusschock @awaelchli