-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove NumPy <2 pin #6031
Remove NumPy <2 pin #6031
Conversation
Updating branch to pull in the latest upstream changes and restart CI now that cuDF is done: rapidsai/cudf#16300 |
One GHA job failed with an unrelated CUDA initialization error Unfortunately this seems to be showing up more in CI:
Will raise this offline for discussion E UserWarning: Error getting driver and runtime versions:
E
E stdout:
E
E
E
E stderr:
E
E Traceback (most recent call last):
E File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 254, in ensure_initialized
E self.cuInit(0)
E File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 327, in safe_cuda_api_call
E self._check_ctypes_error(fname, retcode)
E File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 395, in _check_ctypes_error
E raise CudaAPIError(retcode, msg)
E numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuInit results in CUDA_ERROR_UNKNOWN
E
E During handling of the above exception, another exception occurred:
E
E Traceback (most recent call last):
E File "<string>", line 4, in <module>
E File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 292, in __getattr__
E self.ensure_initialized()
E File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
E raise CudaSupportError(f"Error at driver init: {description}")
E numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_UNKNOWN (999)
E
E
E Not patching Numba For now will just restart the failed jobs after the others complete Edit: Another job had the same sort of error Edit 2: And again in this job after restarting |
@@ -509,7 +509,7 @@ dependencies: | |||
- *scikit_learn | |||
- statsmodels | |||
- umap-learn==0.5.3 | |||
- pynndescent==0.5.8 | |||
- pynndescent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantegd , is it alright if we relax this pin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given passing tests, it should be fine to relax now
/merge |
Am seeing the |
Still seeing this issue. Going to test CI separately from this change in PR: #6047 |
So that PR's CI builds fail because |
Was searching around in the code for clues. Just came across this, which was unexpected cuml/python/cuml/cuml/neighbors/CMakeLists.txt Lines 38 to 40 in 973a65f
Does cuML need NumPy at build time? If so, would have expected to see |
|
||
foreach(target IN LISTS targets_using_numpy) | ||
target_include_directories(${target} PRIVATE "${Python_NumPy_INCLUDE_DIRS}") | ||
endforeach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying dropping this. AFAICT the Cython modules above don't cimport numpy
. So they wouldn't need this
Not sure whether it would cause the tests to hang. At a minimum, it is unused; so, it is worth cleaning up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure what happened there. I traced back in the blame to where this was added. Looks like @vyasr recommended removing it back at the source: #4818 (comment)
But there wasn't any additional discussion on that PR (maybe it happened somewhere else), and the change was merged in.
I agree with you that it seems to be unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally was looking for clues on fixing the hanging test ( #6031 (comment) ). Tried this just in case it helped, but it didn't matter
Read through the history here yesterday. It seems like NumPy was a build dependency a while back (though still wasn't clear then whether it was being used). Think since then every update has assumed NumPy was a build dependency. However as we don't require it during the build, it isn't actually satisfied.
Further we would have needed find_package(Python REQUIRED COMPONENTS Development NumPy)
to find NumPy and set $Python_NumPy_INCLUDE_DIRS
, but we don't do that either.
Think this hasn't presented much of an issue as we don't actually set targets_using_numpy
.
In any event, this time seems as good as any for cleaning this up
@@ -229,6 +229,7 @@ dependencies: | |||
- dask-cuda==24.10.*,>=0.0.0a0 | |||
- joblib>=0.11 | |||
- numba>=0.57 | |||
- numpy>=1.23,<3.0a0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we have a NumPy dependency
import numpy as np |
However it isn't getting declared as one. So explicitly added NumPy as a dependency
Divye documented the CI hang occurring with the Also he added a skip for that test in PR: #6051 Unfortunately other CI jobs still fail due to NumPy 2 being unconstrained and an incompatible Fortunately the latter fix is already here In the hopes of getting CI to pass, have merged Divye's PR into this one. That way all the fixes and skips for CI are in one place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one suggestion to fix failing wheel tests, otherwise the packaging changes here look good to me.
I search around a bit and looked through CI logs, don't see any other issues.
Looks like this CI job had one test failure ______________________ test_weighted_kmeans[10-10-25-100] ______________________
[gw0] linux -- Python 3.11.9 /pyenv/versions/3.11.9/bin/python
nrows = 100, ncols = 25, nclusters = 10, max_weight = 10, random_state = 428096
@pytest.mark.parametrize("nrows", [100, 500])
@pytest.mark.parametrize("ncols", [25])
@pytest.mark.parametrize("nclusters", [5, 10])
@pytest.mark.parametrize("max_weight", [10])
def test_weighted_kmeans(nrows, ncols, nclusters, max_weight, random_state):
# Using fairly high variance between points in clusters
cluster_std = 1.0
np.random.seed(random_state)
# set weight per sample to be from 1 to max_weight
wt = np.random.randint(1, high=max_weight, size=nrows)
X, y = make_blobs(
nrows,
ncols,
nclusters,
cluster_std=cluster_std,
shuffle=False,
random_state=0,
)
cuml_kmeans = cuml.KMeans(
init="k-means++",
n_clusters=nclusters,
n_init=10,
random_state=random_state,
output_type="numpy",
)
cuml_kmeans.fit(X, sample_weight=wt)
cu_score = cuml_kmeans.score(X)
sk_kmeans = cluster.KMeans(random_state=random_state, n_clusters=nclusters)
sk_kmeans.fit(cp.asnumpy(X), sample_weight=wt)
sk_score = sk_kmeans.score(cp.asnumpy(X))
> assert abs(cu_score - sk_score) <= cluster_std * 1.5
E assert 6151.191162109375 <= (1.0 * 1.5)
E + where 6151.191162109375 = abs((-2365.749267578125 - -8516.9404296875))
test_kmeans.py:174: AssertionError
---------------------------- Captured stdout setup -----------------------------
[D] [20:29:31.325625] /__w/cuml/cuml/python/cuml/build/cp311-cp311-linux_aarch64/cuml/internals/logger.cxx:5269 Random seed: 428096 Not entirely sure why that happened (or why this only happens now) Given we don't see this test failure anywhere else, am going to assume this was a flaky test and try restarting Though documenting here in case it comes up again (in the event it needs follow up) |
Thanks all for your help here! 🙏 Looks like it passed and the old merge comment ( #6031 (comment) ) took affect Let's follow up on the hanging test in issue: #6050 Happy to discuss anything else in new issues 🙂 |
This PR removes the NumPy<2 pin which is expected to work for
RAPIDS projects once CuPy 13.3.0 is released (CuPy 13.2.0 had
some issues preventing the use with NumPy 2).