Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion context/notebooks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@

set -euo pipefail

NOTEBOOK_REPOS=(cudf cuml cugraph)
# TODO: restore cuml notebook testing on CUDA 13 once there are CUDA 13 xgboost packages and 'rapids' depends on them
# ref: https://github.com/rapidsai/integration/issues/798
if [[ "${CUDA_VER%%.*}" == "12" ]]; then
NOTEBOOK_REPOS=(cudf cuml cugraph)
else
NOTEBOOK_REPOS=(cudf cugraph)
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have predicted this... there are some cuml notebooks that expect to be able to train an xgboost model using GPUs.

Depending on the CPU-only version thanks to rapidsai/integration#795 leads to this:

XGBoostError: [17:20:04] /home/conda/feedstock_root/build_artifacts/xgboost-split_1754002079811/work/src/c_api/../common/common.h:181: XGBoost version not compiled with GPU support.
Stack trace:
  [bt] (0) /opt/conda/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6e) [0x7522dbe3857e]
  [bt] (1) /opt/conda/lib/libxgboost.so(xgboost::common::AssertGPUSupport()+0x3b) [0x7522dbe3881b]
  [bt] (2) /opt/conda/lib/libxgboost.so(XGDMatrixCreateFromCudaArrayInterface+0xf) [0x7522dbda038f]
  [bt] (3) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(+0x6d8a) [0x75242f774d8a]
  [bt] (4) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(+0x61cd) [0x75242f7741cd]
  [bt] (5) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(ffi_call+0xcd) [0x75242f77491d]
  [bt] (6) /opt/conda/lib/python3.13/lib-dynload/_ctypes.cpython-313-x86_64-linux-gnu.so(+0x15f90) [0x75242f78ff90]
  [bt] (7) /opt/conda/lib/python3.13/lib-dynload/_ctypes.cpython-313-x86_64-linux-gnu.so(+0x13da6) [0x75242f78dda6]
  [bt] (8) /opt/conda/bin/python(_PyObject_MakeTpCall+0x27c) [0x6331f8b71ddc]

(build link)

This proposes just skipping cuml notebook testing here temporarily, to unblock publishing the first nightly container images with CUDA 13 packages.

If reviewers agree, I'll add an issue in this repo tracking the work of putting that testing back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have the issue up I'm fine with this temporary patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you. Put up an issue here: #784


mkdir -p /notebooks /dependencies
for REPO in "${NOTEBOOK_REPOS[@]}"; do
Expand Down
6 changes: 3 additions & 3 deletions cuvs-bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ export DATA_FOLDER=path/to/store/results/and/data
docker run --gpus all --rm -it \
-v $DATA_FOLDER:/home/rapids/benchmarks \
-u $(id -u) \
rapidsai/cuvs-bench:25.10a-cuda12.9-py3.13 \
rapidsai/cuvs-bench:25.10a-cuda13.0-py3.13 \
"--dataset deep-image-96-angular" \
"--normalize" \
"--algorithms cuvs_cagra" \
Expand All @@ -47,7 +47,7 @@ Where:

- `DATA_FOLDER=path/to/store/results/and/data`: Results and datasets will be written to this host folder.
- `-u $(id -u)`: This flag allows the container to use the host user for permissions
- `rapidsai/cuvs-bench:25.10a-cuda12.9-py3.13`: Image to use, either `cuvs-bench` or `cuvs-bench-datasets`, cuVS version, CUDA version, and Python version.
- `rapidsai/cuvs-bench:25.10a-cuda13.0-py3.13`: Image to use, either `cuvs-bench` or `cuvs-bench-datasets`, cuVS version, CUDA version, and Python version.
- "--dataset deep-image-96-angular": Dataset name(s). See https://docs.rapids.ai/api/cuvs/nightly/cuvs_bench for more details.
- "--normalize": Whether to normalize the dataset, leave string empty ("") to not normalize.
- "--algorithms cuvs_cagra": What algorithm(s) to use as a ; separated list, as well as any other argument to pass to `cuvs_bench.run`.
Expand All @@ -74,7 +74,7 @@ export DATA_FOLDER=path/to/store/results/and/data
docker run --gpus all --rm -it \
-v $DATA_FOLDER:/home/rapids/benchmarks \
-u $(id -u) \
rapidsai/cuvs-bench:25.10a-cuda12.9-py3.13 \
rapidsai/cuvs-bench:25.10a-cuda13.0-py3.13 \
--entrypoint /bin/bash
```

Expand Down
8 changes: 4 additions & 4 deletions dockerhub-readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ There are two types:
The tag naming scheme for RAPIDS images incorporates key platform details into the tag as shown below:

```text
25.10-cuda12.9-py3.13
25.10-cuda13.0-py3.13
^ ^ ^
| | Python version
| |
Expand All @@ -47,7 +47,7 @@ The tag naming scheme for RAPIDS images incorporates key platform details into t
RAPIDS version
```

**Note: Nightly builds of the images have the RAPIDS version appended with an `a` (ie `25.10a-cuda12.9-py3.13`)**
**Note: Nightly builds of the images have the RAPIDS version appended with an `a` (ie `25.10a-cuda13.0-py3.13`)**

## Usage

Expand Down Expand Up @@ -80,7 +80,7 @@ $ docker run \
-e EXTRA_CONDA_PACKAGES="jq" \
-e EXTRA_PIP_PACKAGES="beautifulsoup4" \
-p 8888:8888 \
rapidsai/notebooks:25.10-cuda12.9-py3.13
rapidsai/notebooks:25.10-cuda13.0-py3.13
```

### Bind Mounts
Expand All @@ -105,7 +105,7 @@ $ docker run \
--gpus all \
--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-v $(pwd)/environment.yml:/home/rapids/environment.yml \
rapidsai/base:25.10-cuda12.9-py3.13
rapidsai/base:25.10-cuda13.0-py3.13
```

### Use JupyterLab to Explore the Notebooks
Expand Down
16 changes: 8 additions & 8 deletions matrix-test.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Copyright (c) 2023-2025, NVIDIA CORPORATION.
# CUDA_VER is `<major>.<minor>` (e.g. `12.0`)
# CUDA_VER is `<major>.<minor>` (e.g. `13.0`)

pull-request:
- { CUDA_VER: '12.0', ARCH: 'amd64', PYTHON_VER: '3.10', GPU: 'l4', DRIVER: 'earliest' }
- { CUDA_VER: '12.9', ARCH: 'arm64', PYTHON_VER: '3.11', GPU: 'a100', DRIVER: 'latest' }
- { CUDA_VER: '12.9', ARCH: 'amd64', PYTHON_VER: '3.13', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '12.9', ARCH: 'amd64', PYTHON_VER: '3.12', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '13.0', ARCH: 'arm64', PYTHON_VER: '3.13', GPU: 'a100', DRIVER: 'latest' }
- { CUDA_VER: '13.0', ARCH: 'amd64', PYTHON_VER: '3.13', GPU: 'h100', DRIVER: 'latest' }
branch:
- { CUDA_VER: '12.0', ARCH: 'amd64', PYTHON_VER: '3.10', GPU: 'l4', DRIVER: 'earliest' }
- { CUDA_VER: '12.0', ARCH: 'amd64', PYTHON_VER: '3.10', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '12.0', ARCH: 'arm64', PYTHON_VER: '3.11', GPU: 'a100', DRIVER: 'latest' }
- { CUDA_VER: '12.0', ARCH: 'amd64', PYTHON_VER: '3.12', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '12.9', ARCH: 'amd64', PYTHON_VER: '3.13', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '12.0', ARCH: 'arm64', PYTHON_VER: '3.11', GPU: 'a100', DRIVER: 'earliest' }
- { CUDA_VER: '12.9', ARCH: 'amd64', PYTHON_VER: '3.11', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '12.9', ARCH: 'arm64', PYTHON_VER: '3.13', GPU: 'a100', DRIVER: 'latest' }
- { CUDA_VER: '13.0', ARCH: 'amd64', PYTHON_VER: '3.11', GPU: 'l4', DRIVER: 'latest' }
- { CUDA_VER: '13.0', ARCH: 'arm64', PYTHON_VER: '3.12', GPU: 'a100', DRIVER: 'latest' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from our shared-workflows discussion, should we run at least one of these jobs on an h100? This is a low traffic repo so it shouldn't add too much load and it seems like it would be a good test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I agree!

I just pushed f003cb3 switching one of these PR jobs to H100s

3 changes: 2 additions & 1 deletion matrix.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Copyright (c) 2023-2025, NVIDIA CORPORATION.

CUDA_VER: # Should be `<major>.<minor>.<patch>` (e.g. `12.9.0`)
CUDA_VER: # Should be `<major>.<minor>.<patch>` (e.g. `13.0.0`)
- "12.0.1"
- "12.9.1"
- "13.0.0"
PYTHON_VER:
- "3.10"
- "3.11"
Expand Down
2 changes: 1 addition & 1 deletion tests/container-canary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Install `container-canary` following the instructions in that project's repo.
Run the tests against a built image, the same way they're run in CI.

```shell
IMAGE_URI="rapidsai/notebooks:25.10a-cuda12.9-py3.13"
IMAGE_URI="rapidsai/notebooks:25.10a-cuda13.0-py3.13"

ci/run-validation-checks.sh \
--dask-scheduler \
Expand Down
Loading