Work-around aarch64 conda installed numpy 2.x version. #1984
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background:
PyTorch Nightly Binary Validation workflow and PyTorch 2.5.0 RC1 Binary Validation workflow both failed for aarch64, which seems to co-relate to CUDA bump from 12.4.0 to 12.4.1 (see this )
Example failed github actions job: https://github.com/pytorch/builder/actions/runs/10794919545/job/29940441536 and v250 RC1 https://github.com/pytorch/builder/actions/runs/10794919545/job/29944860153
Locally reproduced this by following the critical step below:
/opt/conda/bin/conda create -y -n conda-env-10794919545 python=3.10 numpy ffmpeg
then run pip3 install torch --index-url https://download.pytorch.org/whl/test/cu124 could easily reproduce the following error (shown in the above github action failure links)
2024-09-10T16:08:19.4727026Z ++ python3 ./test/smoke_test/smoke_test.py --package torchonly
2024-09-10T16:08:19.4727531Z Traceback (most recent call last):
2024-09-10T16:08:19.4728089Z File "/pytorch/builder/./test/smoke_test/smoke_test.py", line 9, in
2024-09-10T16:08:19.4728654Z import torch._dynamo
2024-09-10T16:08:19.4729527Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/init.py", line 3, in
2024-09-10T16:08:19.4730459Z from . import convert_frame, eval_frame, resume_execution
2024-09-10T16:08:19.4731531Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 53, in
2024-09-10T16:08:19.4732512Z from . import config, exc, trace_rules
2024-09-10T16:08:19.4733556Z File "/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 45, in
2024-09-10T16:08:19.4734616Z from .utils import getfile, hashable, NP_SUPPORTED_MODULES, unwrap_if_wrapper
2024-09-10T16:08:19.4736024Z ImportError: cannot import name 'NP_SUPPORTED_MODULES' from 'torch._dynamo.utils' (/opt/conda/envs/conda-env-10794919545/lib/python3.10/site-packages/torch/_dynamo/utils.py)
Two possible workarounds identified:
I currently do not quite know why on ARM64, numpy anaconda package does not seem to be compatible with our generated pytorch wheel. As a follow-up, maybe we can check whether the cuda 12.4.0 arm nightly wheel is compatible with this numpy version.
Update: cuda 12.4.0 aarch64 cuda wheel seems to get along well with conda numpy 2.1.1. So it is likely that cuda bump had introduced incompatbility with conda's numpy.
Since we cannot prevent users from using conda's numpy 2.x, ideally we should come up with a fix on the pytorch aarch64 cuda wheel side.
cc @atalman @malfet @ptrblck @tinglvv