Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch and Torchvision version update #1216

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from

Conversation

tanwarsh
Copy link
Collaborator

@tanwarsh tanwarsh commented Dec 16, 2024

Update PyTorch to version 2.4.1 and Torchvision to version 0.19.1.
setuptools is pinned to 59.6.0 as with latest version we are facing issue pytorch/pytorch#143288.

Fix for #1075.

Signed-off-by: yes <shailesh.tanwar@intel.com>
tanwarsh and others added 14 commits December 16, 2024 05:03
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
@tanwarsh
Copy link
Collaborator Author

tanwarsh commented Dec 18, 2024

Pipelines are facing Issue pytorch/pytorch#140914.

@tanwarsh tanwarsh changed the title WIP: PyTorch and Torchvision version update PyTorch and Torchvision version update Dec 18, 2024
@kta-intel
Copy link
Collaborator

Pipelines are facing Issue pytorch/pytorch#140914.

Thanks for raising the issue. Can you look into the possibility of adding the monkey patch to openfl.interface.cli?
I didn't do any extensive testing on the e2e functionality, but it got past the TypeError

pytorch/pytorch#140914 (comment)

If it works, we can maybe implement it while we wait for a more robust upstream solution, or we can investigate where/how pip._vendor.typing_extensions is being forcibly called for a more robust solution in our code base

@tanwarsh
Copy link
Collaborator Author

Pipelines are facing Issue pytorch/pytorch#140914.

Thanks for raising the issue. Can you look into the possibility of adding the monkey patch to openfl.interface.cli? I didn't do any extensive testing on the e2e functionality, but it got past the TypeError

pytorch/pytorch#140914 (comment)

If it works, we can maybe implement it while we wait for a more robust upstream solution, or we can investigate where/how pip._vendor.typing_extensions is being forcibly called for a more robust solution in our code base

Thank you @kta-intel. Workaround patch worked.

@MasterSkepticista
Copy link
Collaborator

Why are we insisting on pinning torch and torchvision, that too to latest versions? Is there anything we lose by using stable, yet recent torch versions?

I am not in favor of instrumenting and applying hotfixes that become a future maintenance burden, especially if torch~=2.3.0 works without any bugs mentioned in this PR.

WDYT? @psfoley @kta-intel @teoparvanov @theakshaypant

@tanwarsh
Copy link
Collaborator Author

Why are we insisting on pinning torch and torchvision, that too to latest versions? Is there anything we lose by using stable, yet recent torch versions?

I am not in favor of instrumenting and applying hotfixes that become a future maintenance burden, especially if torch~=2.3.0 works without any bugs mentioned in this PR.

WDYT? @psfoley @kta-intel @teoparvanov @theakshaypant

with torch~=2.3.0 we are facing issue in windows. Please refer this Issue #1075

@MasterSkepticista
Copy link
Collaborator

Why are we insisting on pinning torch and torchvision, that too to latest versions? Is there anything we lose by using stable, yet recent torch versions?
I am not in favor of instrumenting and applying hotfixes that become a future maintenance burden, especially if torch~=2.3.0 works without any bugs mentioned in this PR.
WDYT? @psfoley @kta-intel @teoparvanov @theakshaypant

with torch~=2.3.0 we are facing issue in windows. Please refer this Issue #1075

Did we try 2.4.x?

@kta-intel
Copy link
Collaborator

Why are we insisting on pinning torch and torchvision, that too to latest versions? Is there anything we lose by using stable, yet recent torch versions?
I am not in favor of instrumenting and applying hotfixes that become a future maintenance burden, especially if torch~=2.3.0 works without any bugs mentioned in this PR.
WDYT? @psfoley @kta-intel @teoparvanov @theakshaypant

with torch~=2.3.0 we are facing issue in windows. Please refer this Issue #1075

Did we try 2.4.x?

I believe 2.4.x was initially triggering the AssertionError: /python3.11/distutils/core.py, but it seems that @tanwarsh managed to get past that error. Can you apply your resolution for the distutil issue and then check if this type error is also present out of the box? If not, I think we can safely stick to 2.4.x for now.

Though I think it is a practical solution, @MasterSkepticista brings up a good point that using sys.module to force consistent modules may be not a good long-term solution. It seems like the issue is directly related torch (under torch==2.3.1, pip.__vendor.typing_extensions does not get imported when checking ParamSpec, but it does in torch===2.5.1), so we may want to revive that thread and help the investigation since there doesn't seem to be a ton of action in it

@tanwarsh
Copy link
Collaborator Author

tanwarsh commented Jan 8, 2025

Why are we insisting on pinning torch and torchvision, that too to latest versions? Is there anything we lose by using stable, yet recent torch versions?
I am not in favor of instrumenting and applying hotfixes that become a future maintenance burden, especially if torch~=2.3.0 works without any bugs mentioned in this PR.
WDYT? @psfoley @kta-intel @teoparvanov @theakshaypant

with torch~=2.3.0 we are facing issue in windows. Please refer this Issue #1075

Did we try 2.4.x?

I believe 2.4.x was initially triggering the AssertionError: /python3.11/distutils/core.py, but it seems that @tanwarsh managed to get past that error. Can you apply your resolution for the distutil issue and then check if this type error is also present out of the box? If not, I think we can safely stick to 2.4.x for now.

Though I think it is a practical solution, @MasterSkepticista brings up a good point that using sys.module to force consistent modules may be not a good long-term solution. It seems like the issue is directly related torch (under torch==2.3.1, pip.__vendor.typing_extensions does not get imported when checking ParamSpec, but it does in torch===2.5.1), so we may want to revive that thread and help the investigation since there doesn't seem to be a ton of action in it

Thank you @kta-intel and @MasterSkepticista for your inputs. I will downgrade the version to 2.4.x and check both issues. If neither issue or the type error is not encountered, I will proceed with 2.4.x.

Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
Signed-off-by: yes <shailesh.tanwar@intel.com>
@tanwarsh
Copy link
Collaborator Author

tanwarsh commented Jan 8, 2025

@MasterSkepticista @kta-intel, We are using version 2.4.1 now without patch. Please review again.

@@ -1,7 +1,7 @@
numpy>=1.22.2 # not directly required, pinned by Snyk to avoid a vulnerability
rsa>=4.7 # not directly required, pinned by Snyk to avoid a vulnerability
setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
setuptools==59.6.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like pinning to 59.6.0 will trigger Snyk vulnerability concerns. Will this cause us issues down the line?

@kta-intel
Copy link
Collaborator

kta-intel commented Jan 8, 2025

@MasterSkepticista @kta-intel, We are using version 2.4.1 now without patch. Please review again.

Thanks for checking into that @tanwarsh. Glad 2.4.1 is not raising the TypeError issue. My open now is if downgrading setuptools is the path we want to go. Even if we're able to justify it for a release, if we're stuck at 59.6.0 going forward with torch workloads, that could be unsustainable long term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants