Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Diagnose pytest cuml-dask hang in CUDA 12.5 wheel CI tests #6050

Closed
divyegala opened this issue Aug 28, 2024 · 3 comments
Closed

[BUG] Diagnose pytest cuml-dask hang in CUDA 12.5 wheel CI tests #6050

divyegala opened this issue Aug 28, 2024 · 3 comments
Labels
bug Something isn't working Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features.

Comments

@divyegala
Copy link
Member

divyegala commented Aug 28, 2024

First reported by @jakirkham in PR #6031, we are now seeing that pytest dask-cuml hangs in CUDA 12.5 wheel CI jobs. Until we figure out the root cause of this issue, we will be temporarily disabling that test suite.

Reference to the hang: CI job link

cc @dantegd

@divyegala divyegala added bug Something isn't working Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. labels Aug 28, 2024
@jakirkham
Copy link
Member

It is worth noting this hang can also be seen in a no change PR: #6047 (comment)

This was referenced Aug 28, 2024
@viclafargue
Copy link
Contributor

I cannot reproduce the issue anymore on an L40 machine with the latest commit hash. But, for those interested in reproducing it elsewhere, here are some commands :

docker run --gpus all --pull always --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -u 0 --entrypoint bash rapidsai/citestwheel:cuda12.5.1-ubuntu22.04-py3.10

export RAPIDS_BUILD_TYPE=nightly
export RAPIDS_REPOSITORY=rapidsai/cuml
export RAPIDS_REF_NAME=branch-24.10
export RAPIDS_SHA=d87b0ce
export RAPIDS_NIGHTLY_DATE=2024-09-05

git clone https://github.com/rapidsai/gha-tools.git
(git clone https://github.com/rapidsai/cuml.git && cd cuml && git checkout $RAPIDS_SHA)

mkdir -p ./dist
GHA_TOOLS_DIR=gha-tools/tools
RAPIDS_PY_CUDA_SUFFIX="$($GHA_TOOLS_DIR/rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" $GHA_TOOLS_DIR/rapids-download-wheels-from-s3 ./dist

python -m pip install $(echo ./dist/cuml*.whl)[test]

bash cuml/ci/run_cuml_dask_pytests.sh

@divyegala
Copy link
Member Author

Closed by #6051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features.
Projects
None yet
Development

No branches or pull requests

3 participants