Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NumPy <2 pin #6031

Merged
merged 12 commits into from
Aug 28, 2024
Merged

Remove NumPy <2 pin #6031

merged 12 commits into from
Aug 28, 2024

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Aug 19, 2024

This PR removes the NumPy<2 pin which is expected to work for
RAPIDS projects once CuPy 13.3.0 is released (CuPy 13.2.0 had
some issues preventing the use with NumPy 2).

@seberg seberg added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Aug 19, 2024
@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue labels Aug 19, 2024
@jakirkham
Copy link
Member

Updating branch to pull in the latest upstream changes and restart CI now that cuDF is done: rapidsai/cudf#16300

@jakirkham jakirkham marked this pull request as ready for review August 24, 2024 05:50
@jakirkham jakirkham requested a review from a team as a code owner August 24, 2024 05:50
@jakirkham
Copy link
Member

jakirkham commented Aug 24, 2024

One GHA job failed with an unrelated CUDA initialization error

Unfortunately this seems to be showing up more in CI:

Will raise this offline for discussion

E   UserWarning: Error getting driver and runtime versions:
E
E   stdout:
E
E
E
E   stderr:
E
E   Traceback (most recent call last):
E     File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 254, in ensure_initialized
E       self.cuInit(0)
E     File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 327, in safe_cuda_api_call
E       self._check_ctypes_error(fname, retcode)
E     File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 395, in _check_ctypes_error
E       raise CudaAPIError(retcode, msg)
E   numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuInit results in CUDA_ERROR_UNKNOWN
E
E   During handling of the above exception, another exception occurred:
E
E   Traceback (most recent call last):
E     File "<string>", line 4, in <module>
E     File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 292, in __getattr__
E       self.ensure_initialized()
E     File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
E       raise CudaSupportError(f"Error at driver init: {description}")
E   numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_UNKNOWN (999)
E
E
E   Not patching Numba

For now will just restart the failed jobs after the others complete

Edit: Another job had the same sort of error

Edit 2: And again in this job after restarting

@@ -509,7 +509,7 @@ dependencies:
- *scikit_learn
- statsmodels
- umap-learn==0.5.3
- pynndescent==0.5.8
- pynndescent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dantegd , is it alright if we relax this pin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given passing tests, it should be fine to relax now

@jakirkham
Copy link
Member

/merge

@jakirkham
Copy link
Member

Am seeing the wheel-tests-cuml CUDA 12.5 job getting stuck in the Dask tests (though don't see this with the CUDA 11.8 job). Not sure why that is given both would be using the same NumPy versions. Going to try merging in the upstream branch in case there is some fix we are missing. If there is an issue with this CI node, maybe that will give us a new one as well

@jakirkham
Copy link
Member

Am seeing the wheel-tests-cuml CUDA 12.5 job getting stuck in the Dask tests (though don't see this with the CUDA 11.8 job).

Still seeing this issue. Going to test CI separately from this change in PR: #6047

@jakirkham
Copy link
Member

So that PR's CI builds fail because pynndescent is pinned to the old version ( and thus doesn't have this fix: lmcinnes/pynndescent#242 )

@jakirkham
Copy link
Member

Was searching around in the code for clues. Just came across this, which was unexpected

foreach(target IN LISTS targets_using_numpy)
target_include_directories(${target} PRIVATE "${Python_NumPy_INCLUDE_DIRS}")
endforeach()

Does cuML need NumPy at build time?

If so, would have expected to see cimport numpy or similar in those Cython files, but am not seeing that

@jakirkham jakirkham requested a review from a team as a code owner August 28, 2024 01:13
@github-actions github-actions bot added the CMake label Aug 28, 2024
Comment on lines -37 to -40

foreach(target IN LISTS targets_using_numpy)
target_include_directories(${target} PRIVATE "${Python_NumPy_INCLUDE_DIRS}")
endforeach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying dropping this. AFAICT the Cython modules above don't cimport numpy. So they wouldn't need this

Not sure whether it would cause the tests to hang. At a minimum, it is unused; so, it is worth cleaning up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure what happened there. I traced back in the blame to where this was added. Looks like @vyasr recommended removing it back at the source: #4818 (comment)

But there wasn't any additional discussion on that PR (maybe it happened somewhere else), and the change was merged in.

I agree with you that it seems to be unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally was looking for clues on fixing the hanging test ( #6031 (comment) ). Tried this just in case it helped, but it didn't matter

Read through the history here yesterday. It seems like NumPy was a build dependency a while back (though still wasn't clear then whether it was being used). Think since then every update has assumed NumPy was a build dependency. However as we don't require it during the build, it isn't actually satisfied.

Further we would have needed find_package(Python REQUIRED COMPONENTS Development NumPy) to find NumPy and set $Python_NumPy_INCLUDE_DIRS, but we don't do that either.

Think this hasn't presented much of an issue as we don't actually set targets_using_numpy.

In any event, this time seems as good as any for cleaning this up

@@ -229,6 +229,7 @@ dependencies:
- dask-cuda==24.10.*,>=0.0.0a0
- joblib>=0.11
- numba>=0.57
- numpy>=1.23,<3.0a0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have a NumPy dependency

However it isn't getting declared as one. So explicitly added NumPy as a dependency

@jakirkham
Copy link
Member

jakirkham commented Aug 28, 2024

Divye documented the CI hang occurring with the pytest cuml-dask CUDA 12.5 wheel job in issue: #6050

Also he added a skip for that test in PR: #6051

Unfortunately other CI jobs still fail due to NumPy 2 being unconstrained and an incompatible pynndescent being installed as observed in a no change PR: #6047 (comment)

Fortunately the latter fix is already here

In the hopes of getting CI to pass, have merged Divye's PR into this one. That way all the fixes and skips for CI are in one place

Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one suggestion to fix failing wheel tests, otherwise the packaging changes here look good to me.

I search around a bit and looked through CI logs, don't see any other issues.

ci/test_wheel.sh Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Looks like this CI job had one test failure

______________________ test_weighted_kmeans[10-10-25-100] ______________________
[gw0] linux -- Python 3.11.9 /pyenv/versions/3.11.9/bin/python

nrows = 100, ncols = 25, nclusters = 10, max_weight = 10, random_state = 428096

    @pytest.mark.parametrize("nrows", [100, 500])
    @pytest.mark.parametrize("ncols", [25])
    @pytest.mark.parametrize("nclusters", [5, 10])
    @pytest.mark.parametrize("max_weight", [10])
    def test_weighted_kmeans(nrows, ncols, nclusters, max_weight, random_state):
    
        # Using fairly high variance between points in clusters
        cluster_std = 1.0
        np.random.seed(random_state)
    
        # set weight per sample to be from 1 to max_weight
        wt = np.random.randint(1, high=max_weight, size=nrows)
    
        X, y = make_blobs(
            nrows,
            ncols,
            nclusters,
            cluster_std=cluster_std,
            shuffle=False,
            random_state=0,
        )
    
        cuml_kmeans = cuml.KMeans(
            init="k-means++",
            n_clusters=nclusters,
            n_init=10,
            random_state=random_state,
            output_type="numpy",
        )
    
        cuml_kmeans.fit(X, sample_weight=wt)
        cu_score = cuml_kmeans.score(X)
    
        sk_kmeans = cluster.KMeans(random_state=random_state, n_clusters=nclusters)
        sk_kmeans.fit(cp.asnumpy(X), sample_weight=wt)
        sk_score = sk_kmeans.score(cp.asnumpy(X))
    
>       assert abs(cu_score - sk_score) <= cluster_std * 1.5
E       assert 6151.191162109375 <= (1.0 * 1.5)
E        +  where 6151.191162109375 = abs((-2365.749267578125 - -8516.9404296875))

test_kmeans.py:174: AssertionError
---------------------------- Captured stdout setup -----------------------------
[D] [20:29:31.325625] /__w/cuml/cuml/python/cuml/build/cp311-cp311-linux_aarch64/cuml/internals/logger.cxx:5269 Random seed: 428096

Not entirely sure why that happened (or why this only happens now)

Given we don't see this test failure anywhere else, am going to assume this was a flaky test and try restarting

Though documenting here in case it comes up again (in the event it needs follow up)

@rapids-bot rapids-bot bot merged commit e371e53 into rapidsai:branch-24.10 Aug 28, 2024
55 checks passed
@jakirkham
Copy link
Member

jakirkham commented Aug 28, 2024

Thanks all for your help here! 🙏

Looks like it passed and the old merge comment ( #6031 (comment) ) took affect

Let's follow up on the hanging test in issue: #6050

Happy to discuss anything else in new issues 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake conda conda issue Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants