Build and test with CUDA 13.0.0 #782

jameslamb · 2025-08-26T21:58:04Z

Contributes to rapidsai/build-planning#208

uses CUDA 13.0.0 to build and test

jameslamb · 2025-09-05T03:50:17Z

Took a couple of re-runs to get through network errors, but most build jobs are passing.

All of the arm64 CUDA 12.0.1 jobs are failing though, like this:

error    libmamba Could not solve for environment specs
    The following packages are incompatible
    ├─ cuda-version =12.0 * is requested and can be installed;
    └─ rapids =25.10 * is not installable because there are no viable options
       ├─ rapids [25.10.00a6|25.10.00a7] would require
       │  └─ cucim =25.10 * but there are no viable options
       │     ├─ cucim 25.10.00a19 would require
       │     │  └─ libcucim =25.10.0a19 * but there are no viable options
       │     │     ├─ libcucim 25.10.00a19 would require
       │     │     │  └─ libcufile >=1.14.1.1,<2.0a0 * but there are no viable options
       │     │     │     ├─ libcufile 1.14.1.1 would require
       │     │     │     │  └─ cuda-version >=12.9,<12.10.0a0 *, which conflicts with any installable versions previously reported;
       │     │     │     └─ libcufile [1.15.0.42|1.15.1.6] would require
       │     │     │        └─ cuda-version >=13.0,<13.1.0a0 *, which conflicts with any installable versions previously reported;
       │     │     └─ libcucim 25.10.00a19 would require
       │     │        └─ cuda-version >=13,<14.0a0 *, which conflicts with any installable versions previously reported;
       │     └─ cucim 25.10.00a19 would require
       │        └─ cuda-version >=13,<14.0a0 *, which conflicts with any installable versions previously reported;
       └─ rapids 25.10.00a7 would require
          └─ cuda-version >=13,<14.0a0 *, which conflicts with any installable versions previously reported.
critical libmamba Could not solve for environment specs

(build link)

libcucim does not directly constrain its libcufile-dev version:

https://github.com/rapidsai/cucim/blob/7d990eb17d23550f6c919bcb40042e7298328671/conda/recipes/libcucim/meta.yaml#L58

But it does allow libcufile-dev's run exports on arm64:

https://github.com/rapidsai/cucim/blob/7d990eb17d23550f6c919bcb40042e7298328671/conda/recipes/libcucim/meta.yaml#L24

It looks like libcufile-dev has a major-version run export on itself + a {major}.{minor} run dependency on cuda-version. That's probably the issue here:

  - name: libcufile-dev
    build:
      run_exports:
        - {{ pin_subpackage("libcufile", max_pin="x") }}
    # ... omitted ...
    requirements:
      build:
        # ... omitted ...
        - arm-variant * {{ arm_variant_type }}  # [aarch64]
      host:
        - cuda-version {{ cuda_version }}
      run:
        - {{ pin_compatible("cuda-version", max_pin="x.x") }}
        - {{ pin_subpackage("libcufile", exact=True) }}

(conda-forge/libcufile-feedstock - recipe/meta.yaml)

It looks like @jakirkham anticipated this (https://github.com/rapidsai/cucim/pull/905/files#r2292656134) and started the work to avoid it (rapidsai/cucim#930). I think we'll need that to include cucim in these arm64 CUDA 12.0.1 images.

`libcucim` is not installable at the moment on arm64 systems with CUDA < 12.2: * https://github.com/rapidsai/cucim/pull/905/files#r2292609197 * rapidsai/cucim#930 Which is blocking builds of RAPIDS docker images: * rapidsai/docker#782 (comment) This proposes **temporarily** excluding `cucim` from the dependencies of arm64 CUDA 12 `rapids` packages, to unblock `docker` CI (and therefore publication of the first CUDA 13 nightlies of those images) until that `cucim` issue is resolved. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #796

jameslamb · 2025-09-05T18:07:58Z

context/notebooks.sh

+    NOTEBOOK_REPOS=(cudf cuml cugraph)
+else
+    NOTEBOOK_REPOS=(cudf cugraph)
+fi


I should have predicted this... there are some cuml notebooks that expect to be able to train an xgboost model using GPUs.

Depending on the CPU-only version thanks to rapidsai/integration#795 leads to this:

XGBoostError: [17:20:04] /home/conda/feedstock_root/build_artifacts/xgboost-split_1754002079811/work/src/c_api/../common/common.h:181: XGBoost version not compiled with GPU support. Stack trace: [bt] (0) /opt/conda/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6e) [0x7522dbe3857e] [bt] (1) /opt/conda/lib/libxgboost.so(xgboost::common::AssertGPUSupport()+0x3b) [0x7522dbe3881b] [bt] (2) /opt/conda/lib/libxgboost.so(XGDMatrixCreateFromCudaArrayInterface+0xf) [0x7522dbda038f] [bt] (3) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(+0x6d8a) [0x75242f774d8a] [bt] (4) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(+0x61cd) [0x75242f7741cd] [bt] (5) /opt/conda/lib/python3.13/lib-dynload/../../libffi.so.8(ffi_call+0xcd) [0x75242f77491d] [bt] (6) /opt/conda/lib/python3.13/lib-dynload/_ctypes.cpython-313-x86_64-linux-gnu.so(+0x15f90) [0x75242f78ff90] [bt] (7) /opt/conda/lib/python3.13/lib-dynload/_ctypes.cpython-313-x86_64-linux-gnu.so(+0x13da6) [0x75242f78dda6] [bt] (8) /opt/conda/bin/python(_PyObject_MakeTpCall+0x27c) [0x6331f8b71ddc]

(build link)

This proposes just skipping cuml notebook testing here temporarily, to unblock publishing the first nightly container images with CUDA 13 packages.

If reviewers agree, I'll add an issue in this repo tracking the work of putting that testing back.

As long as we have the issue up I'm fine with this temporary patch.

Great, thank you. Put up an issue here: #784

vyasr

Couple notes but LGTM.

vyasr · 2025-09-05T19:14:59Z

context/notebooks.sh

+    NOTEBOOK_REPOS=(cudf cuml cugraph)
+else
+    NOTEBOOK_REPOS=(cudf cugraph)
+fi


As long as we have the issue up I'm fine with this temporary patch.

vyasr · 2025-09-05T19:16:12Z

matrix-test.yaml

+  - { CUDA_VER: '12.9', ARCH: 'amd64', PYTHON_VER: '3.11', GPU: 'l4',   DRIVER: 'latest'   }
  - { CUDA_VER: '12.9', ARCH: 'arm64', PYTHON_VER: '3.13', GPU: 'a100', DRIVER: 'latest'   }
+  - { CUDA_VER: '13.0', ARCH: 'amd64', PYTHON_VER: '3.11', GPU: 'l4',   DRIVER: 'latest'   }
+  - { CUDA_VER: '13.0', ARCH: 'arm64', PYTHON_VER: '3.12', GPU: 'a100', DRIVER: 'latest'   }


Following on from our shared-workflows discussion, should we run at least one of these jobs on an h100? This is a low traffic repo so it shouldn't add too much load and it seems like it would be a good test.

Great point, I agree!

I just pushed f003cb3 switching one of these PR jobs to H100s

jameslamb · 2025-09-05T21:23:11Z

/merge

Closes #784 In #782, we skipped cuML notebook testing on CUDA 13 because there weren't yet CUDA 13 `xgboost` conda packages. Those exist now: * rapidsai/xgboost-feedstock#100 * rapidsai/integration#800 This reverts a workaround from #782, so all notebooks will be tested on CUDA 12 and CUDA 13. It also ensures that the CUDA 13 images include GPU-accelerated builds of `xgboost`. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #785

Build and test with CUDA 13.0.0

fd8c975

jameslamb added improvement non-breaking labels Aug 26, 2025

jameslamb mentioned this pull request Aug 26, 2025

Add CUDA 13.0 Support across all of RAPIDS rapidsai/build-planning#208

Open

jameslamb mentioned this pull request Sep 4, 2025

Build and test with CUDA 13.0.0 rapidsai/integration#795

Merged

update test matrix

d793df8

jameslamb mentioned this pull request Sep 5, 2025

temporarily drop cucim from 'rapids' CUDA 12 package rapidsai/integration#796

Merged

jameslamb mentioned this pull request Sep 5, 2025

Restore cucim dependency for CUDA 12 arm64 packages rapidsai/integration#797

Closed

new commit to fully re-trigger CI

535401c

jameslamb changed the title ~~WIP: Build and test with CUDA 13.0.0~~ Build and test with CUDA 13.0.0 Sep 5, 2025

jameslamb marked this pull request as ready for review September 5, 2025 16:46

jameslamb requested a review from a team as a code owner September 5, 2025 16:46

jameslamb requested a review from KyleFromNVIDIA September 5, 2025 16:46

skip cuml notebooks on CUDA 13

dd82e21

jameslamb commented Sep 5, 2025

View reviewed changes

vyasr approved these changes Sep 5, 2025

View reviewed changes

switch one of the test jobs to H100

f003cb3

jameslamb removed the request for review from KyleFromNVIDIA September 5, 2025 20:53

jameslamb mentioned this pull request Sep 5, 2025

CUDA 13: test cuML notebooks #784

Closed

1 task

rapids-bot bot merged commit f9ed55b into rapidsai:branch-25.10 Sep 5, 2025
86 checks passed

jameslamb deleted the cuda-13.0.0 branch September 5, 2025 21:23

jameslamb mentioned this pull request Sep 9, 2025

include XGBoost CUDA packages in CUDA 13 images #785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build and test with CUDA 13.0.0 #782

Build and test with CUDA 13.0.0 #782

Uh oh!

jameslamb commented Aug 26, 2025 •

edited

Loading

Uh oh!

jameslamb commented Sep 5, 2025

Uh oh!

jameslamb Sep 5, 2025

Uh oh!

vyasr Sep 5, 2025

Uh oh!

jameslamb Sep 5, 2025

Uh oh!

vyasr left a comment

Uh oh!

vyasr Sep 5, 2025

Uh oh!

vyasr Sep 5, 2025

Uh oh!

jameslamb Sep 5, 2025

Uh oh!

jameslamb commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Build and test with CUDA 13.0.0 #782

Build and test with CUDA 13.0.0 #782

Uh oh!

Conversation

jameslamb commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameslamb commented Sep 5, 2025

Uh oh!

jameslamb Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

vyasr Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

vyasr Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

vyasr Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jameslamb commented Aug 26, 2025 •

edited

Loading