Fix pre-commit config for codespell and remaining typos #3182

shwina · 2024-12-16T16:57:51Z

Description

As a follow up to #3168, fixes the pre-commit config so that the options specified in pyproject.toml are actually picked up. Also fixes a couple of remaining typos.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

github-actions · 2024-12-16T19:22:13Z

🟩 CI finished in 2h 19m: Pass: 100%/176 | Total: 2d 23h | Avg: 24m 23s | Max: 1h 21m | Hits: 36%/22502

🟩 libcudacxx: Pass: 100%/48 | Total: 17h 02m | Avg: 21m 18s | Max: 1h 05m | Hits: 31%/9806

🟩 cpu
  🟩 amd64              Pass: 100%/46  | Total: 16h 39m | Avg: 21m 43s | Max:  1h 05m | Hits:  31%/9806  
  🟩 arm64              Pass: 100%/2   | Total: 23m 14s | Avg: 11m 37s | Max: 19m 57s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 38m | Avg: 14m 02s | Max: 29m 46s | Hits:  34%/2237  
  🟩 12.5               Pass: 100%/2   | Total: 58m 44s | Avg: 29m 22s | Max: 30m 27s
  🟩 12.6               Pass: 100%/39  | Total: 14h 25m | Avg: 22m 11s | Max:  1h 05m | Hits:  30%/7569  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 01m | Avg: 15m 24s | Max: 18m 52s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 38m | Avg: 14m 02s | Max: 29m 46s | Hits:  34%/2237  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 58m 44s | Avg: 29m 22s | Max: 30m 27s
  🟩 nvcc12.6           Pass: 100%/35  | Total: 13h 24m | Avg: 22m 58s | Max:  1h 05m | Hits:  30%/7569  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 01m | Avg: 15m 24s | Max: 18m 52s
  🟩 nvcc               Pass: 100%/44  | Total: 16h 01m | Avg: 21m 50s | Max:  1h 05m | Hits:  31%/9806  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 40m 03s | Avg: 10m 00s | Max: 15m 14s
  🟩 Clang10            Pass: 100%/1   | Total: 24m 25s | Avg: 24m 25s | Max: 24m 25s
  🟩 Clang11            Pass: 100%/1   | Total: 14m 20s | Avg: 14m 20s | Max: 14m 20s
  🟩 Clang12            Pass: 100%/1   | Total: 20m 31s | Avg: 20m 31s | Max: 20m 31s
  🟩 Clang13            Pass: 100%/1   | Total: 22m 33s | Avg: 22m 33s | Max: 22m 33s
  🟩 Clang14            Pass: 100%/1   | Total: 22m 05s | Avg: 22m 05s | Max: 22m 05s
  🟩 Clang15            Pass: 100%/1   | Total: 22m 53s | Avg: 22m 53s | Max: 22m 53s
  🟩 Clang16            Pass: 100%/1   | Total: 21m 16s | Avg: 21m 16s | Max: 21m 16s
  🟩 Clang17            Pass: 100%/1   | Total: 21m 59s | Avg: 21m 59s | Max: 21m 59s
  🟩 Clang18            Pass: 100%/8   | Total:  3h 02m | Avg: 22m 47s | Max:  1h 05m
  🟩 GCC6               Pass: 100%/2   | Total: 32m 37s | Avg: 16m 18s | Max: 22m 07s
  🟩 GCC7               Pass: 100%/2   | Total: 30m 30s | Avg: 15m 15s | Max: 15m 30s
  🟩 GCC8               Pass: 100%/1   | Total: 20m 53s | Avg: 20m 53s | Max: 20m 53s
  🟩 GCC9               Pass: 100%/3   | Total: 42m 35s | Avg: 14m 11s | Max: 20m 10s
  🟩 GCC10              Pass: 100%/1   | Total: 23m 20s | Avg: 23m 20s | Max: 23m 20s
  🟩 GCC11              Pass: 100%/1   | Total: 13m 59s | Avg: 13m 59s | Max: 13m 59s
  🟩 GCC12              Pass: 100%/1   | Total: 22m 30s | Avg: 22m 30s | Max: 22m 30s
  🟩 GCC13              Pass: 100%/10  | Total:  3h 44m | Avg: 22m 27s | Max:  1h 03m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 22m 04s | Avg: 22m 04s | Max: 22m 04s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 29m 46s | Avg: 29m 46s | Max: 29m 46s | Hits:  34%/2237  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 36m 23s | Avg: 36m 23s | Max: 36m 23s | Hits:  31%/2474  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 09s | Max: 40m 01s | Hits:  30%/5095  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 58m 44s | Avg: 29m 22s | Max: 30m 27s
🟩 cxx_family
  🟩 Clang              Pass: 100%/20  | Total:  6h 32m | Avg: 19m 37s | Max:  1h 05m
  🟩 GCC                Pass: 100%/21  | Total:  6h 50m | Avg: 19m 34s | Max:  1h 03m
  🟩 Intel              Pass: 100%/1   | Total: 22m 04s | Avg: 22m 04s | Max: 22m 04s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 18m | Avg: 34m 37s | Max: 40m 01s | Hits:  31%/9806  
  🟩 NVHPC              Pass: 100%/2   | Total: 58m 44s | Avg: 29m 22s | Max: 30m 27s
🟩 gpu
  🟩 v100               Pass: 100%/48  | Total: 17h 02m | Avg: 21m 18s | Max:  1h 05m | Hits:  31%/9806  
🟩 jobs
  🟩 Build              Pass: 100%/41  | Total: 13h 02m | Avg: 19m 04s | Max: 40m 01s | Hits:  31%/9806  
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 49m | Avg: 27m 23s | Max: 34m 15s
  🟩 Test               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 13m 36s | Avg: 13m 36s | Max: 13m 36s
  🟩 90a                Pass: 100%/2   | Total: 15m 51s | Avg:  7m 55s | Max: 11m 44s
🟩 std
  🟩 11                 Pass: 100%/6   | Total:  1h 33m | Avg: 15m 30s | Max: 22m 07s
  🟩 14                 Pass: 100%/5   | Total:  1h 39m | Avg: 19m 59s | Max: 31m 58s | Hits:  34%/2237  
  🟩 17                 Pass: 100%/13  | Total:  4h 40m | Avg: 21m 34s | Max: 36m 23s | Hits:  30%/4948  
  🟩 20                 Pass: 100%/23  | Total:  9h 07m | Avg: 23m 47s | Max:  1h 05m | Hits:  30%/2621

🟩 cub: Pass: 100%/47 | Total: 1d 08h | Avg: 40m 51s | Max: 1h 12m | Hits: 30%/3124

🟩 cpu
  🟩 amd64              Pass: 100%/45  | Total:  1d 06h | Avg: 40m 40s | Max:  1h 12m | Hits:  30%/3124  
  🟩 arm64              Pass: 100%/2   | Total:  1h 30m | Avg: 45m 12s | Max: 48m 00s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  4h 41m | Avg: 40m 13s | Max:  1h 02m | Hits:  30%/781   
  🟩 12.5               Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
  🟩 12.6               Pass: 100%/38  | Total:  1d 00h | Avg: 39m 19s | Max:  1h 10m | Hits:  30%/2343  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 39m | Avg: 49m 36s | Max: 50m 27s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  4h 41m | Avg: 40m 13s | Max:  1h 02m | Hits:  30%/781   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
  🟩 nvcc12.6           Pass: 100%/36  | Total: 23h 15m | Avg: 38m 45s | Max:  1h 10m | Hits:  30%/2343  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 39m | Avg: 49m 36s | Max: 50m 27s
  🟩 nvcc               Pass: 100%/45  | Total:  1d 06h | Avg: 40m 28s | Max:  1h 12m | Hits:  30%/3124  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  3h 22m | Avg: 50m 31s | Max: 58m 37s
  🟩 Clang10            Pass: 100%/1   | Total: 56m 04s | Avg: 56m 04s | Max: 56m 04s
  🟩 Clang11            Pass: 100%/1   | Total: 54m 37s | Avg: 54m 37s | Max: 54m 37s
  🟩 Clang12            Pass: 100%/1   | Total: 56m 12s | Avg: 56m 12s | Max: 56m 12s
  🟩 Clang13            Pass: 100%/1   | Total: 52m 11s | Avg: 52m 11s | Max: 52m 11s
  🟩 Clang14            Pass: 100%/1   | Total: 39m 42s | Avg: 39m 42s | Max: 39m 42s
  🟩 Clang15            Pass: 100%/1   | Total: 37m 44s | Avg: 37m 44s | Max: 37m 44s
  🟩 Clang16            Pass: 100%/1   | Total: 39m 11s | Avg: 39m 11s | Max: 39m 11s
  🟩 Clang17            Pass: 100%/1   | Total: 36m 53s | Avg: 36m 53s | Max: 36m 53s
  🟩 Clang18            Pass: 100%/7   | Total:  4h 25m | Avg: 37m 56s | Max: 50m 27s
  🟩 GCC6               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 13s | Max: 33m 29s
  🟩 GCC7               Pass: 100%/2   | Total:  1h 12m | Avg: 36m 15s | Max: 36m 48s
  🟩 GCC8               Pass: 100%/1   | Total: 39m 09s | Avg: 39m 09s | Max: 39m 09s
  🟩 GCC9               Pass: 100%/3   | Total:  1h 40m | Avg: 33m 32s | Max: 37m 47s
  🟩 GCC10              Pass: 100%/1   | Total: 37m 45s | Avg: 37m 45s | Max: 37m 45s
  🟩 GCC11              Pass: 100%/1   | Total: 37m 22s | Avg: 37m 22s | Max: 37m 22s
  🟩 GCC12              Pass: 100%/3   | Total: 58m 21s | Avg: 19m 27s | Max: 38m 13s
  🟩 GCC13              Pass: 100%/8   | Total:  3h 25m | Avg: 25m 39s | Max: 42m 25s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 55m 58s | Avg: 55m 58s | Max: 55m 58s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  30%/781   
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  30%/781   
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 08m | Hits:  30%/1562  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total: 14h 00m | Avg: 44m 13s | Max: 58m 37s
  🟩 GCC                Pass: 100%/21  | Total: 10h 15m | Avg: 29m 18s | Max: 42m 25s
  🟩 Intel              Pass: 100%/1   | Total: 55m 58s | Avg: 55m 58s | Max: 55m 58s
  🟩 MSVC               Pass: 100%/4   | Total:  4h 24m | Avg:  1h 06m | Max:  1h 10m | Hits:  30%/3124  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 12m
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 20m 08s | Avg: 10m 04s | Max: 16m 01s
  🟩 v100               Pass: 100%/45  | Total:  1d 07h | Avg: 42m 14s | Max:  1h 12m | Hits:  30%/3124  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  1d 05h | Avg: 44m 31s | Max:  1h 12m | Hits:  30%/3124  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 33s | Avg: 21m 33s | Max: 21m 33s
  🟩 GraphCapture       Pass: 100%/1   | Total: 16m 56s | Avg: 16m 56s | Max: 16m 56s
  🟩 HostLaunch         Pass: 100%/3   | Total: 53m 05s | Avg: 17m 41s | Max: 18m 40s
  🟩 TestGPU            Pass: 100%/2   | Total: 47m 55s | Avg: 23m 57s | Max: 26m 23s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 20m 08s | Avg: 10m 04s | Max: 16m 01s
  🟩 90a                Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  3h 15m | Avg: 39m 09s | Max: 51m 58s
  🟩 14                 Pass: 100%/4   | Total:  3h 10m | Avg: 47m 39s | Max:  1h 02m | Hits:  30%/781   
  🟩 17                 Pass: 100%/12  | Total: 10h 01m | Avg: 50m 07s | Max:  1h 12m | Hits:  30%/1562  
  🟩 20                 Pass: 100%/26  | Total: 15h 32m | Avg: 35m 52s | Max:  1h 11m | Hits:  30%/781

🟩 thrust: Pass: 100%/46 | Total: 19h 08m | Avg: 24m 58s | Max: 1h 21m | Hits: 41%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 29m 47s | Avg: 14m 53s | Max: 17m 59s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total: 18h 41m | Avg: 25m 29s | Max:  1h 21m | Hits:  41%/9260  
  🟩 arm64              Pass: 100%/2   | Total: 27m 21s | Avg: 13m 40s | Max: 15m 17s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  2h 54m | Avg: 24m 58s | Max:  1h 06m | Hits:  27%/1852  
  🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
  🟩 12.6               Pass: 100%/37  | Total: 13h 57m | Avg: 22m 37s | Max:  1h 21m | Hits:  45%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 25m 17s | Avg: 12m 38s | Max: 14m 06s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  2h 54m | Avg: 24m 58s | Max:  1h 06m | Hits:  27%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
  🟩 nvcc12.6           Pass: 100%/35  | Total: 13h 32m | Avg: 23m 12s | Max:  1h 21m | Hits:  45%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 25m 17s | Avg: 12m 38s | Max: 14m 06s
  🟩 nvcc               Pass: 100%/44  | Total: 18h 43m | Avg: 25m 31s | Max:  1h 21m | Hits:  41%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  2h 04m | Avg: 31m 01s | Max: 37m 21s
  🟩 Clang10            Pass: 100%/1   | Total: 38m 36s | Avg: 38m 36s | Max: 38m 36s
  🟩 Clang11            Pass: 100%/1   | Total: 36m 16s | Avg: 36m 16s | Max: 36m 16s
  🟩 Clang12            Pass: 100%/1   | Total: 35m 39s | Avg: 35m 39s | Max: 35m 39s
  🟩 Clang13            Pass: 100%/1   | Total: 34m 37s | Avg: 34m 37s | Max: 34m 37s
  🟩 Clang14            Pass: 100%/1   | Total: 13m 20s | Avg: 13m 20s | Max: 13m 20s
  🟩 Clang15            Pass: 100%/1   | Total: 16m 40s | Avg: 16m 40s | Max: 16m 40s
  🟩 Clang16            Pass: 100%/1   | Total: 12m 37s | Avg: 12m 37s | Max: 12m 37s
  🟩 Clang17            Pass: 100%/1   | Total: 16m 06s | Avg: 16m 06s | Max: 16m 06s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 31s | Max: 15m 20s
  🟩 GCC6               Pass: 100%/2   | Total: 13m 58s | Avg:  6m 59s | Max:  9m 48s
  🟩 GCC7               Pass: 100%/2   | Total: 15m 37s | Avg:  7m 48s | Max: 10m 54s
  🟩 GCC8               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
  🟩 GCC9               Pass: 100%/3   | Total: 50m 21s | Avg: 16m 47s | Max: 25m 54s
  🟩 GCC10              Pass: 100%/1   | Total: 14m 05s | Avg: 14m 05s | Max: 14m 05s
  🟩 GCC11              Pass: 100%/1   | Total: 14m 13s | Avg: 14m 13s | Max: 14m 13s
  🟩 GCC12              Pass: 100%/1   | Total: 16m 07s | Avg: 16m 07s | Max: 16m 07s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 48m | Avg: 13m 31s | Max: 20m 03s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 50m 50s | Avg: 50m 50s | Max: 50m 50s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m | Hits:  27%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m | Hits:  27%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 47m | Avg: 55m 59s | Max:  1h 21m | Hits:  51%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  6h 55m | Avg: 21m 52s | Max: 38m 36s
  🟩 GCC                Pass: 100%/19  | Total:  4h 04m | Avg: 12m 51s | Max: 25m 54s
  🟩 Intel              Pass: 100%/1   | Total: 50m 50s | Avg: 50m 50s | Max: 50m 50s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 21m | Hits:  41%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total: 19h 08m | Avg: 24m 58s | Max:  1h 21m | Hits:  41%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total: 17h 45m | Avg: 26m 38s | Max:  1h 21m | Hits:  27%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 37m 18s | Avg: 12m 26s | Max: 21m 33s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 45m 39s | Avg: 15m 13s | Max: 17m 59s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  1h 30m | Avg: 18m 00s | Max: 29m 16s
  🟩 14                 Pass: 100%/4   | Total:  2h 04m | Avg: 31m 14s | Max:  1h 06m | Hits:  27%/1852  
  🟩 17                 Pass: 100%/12  | Total:  6h 39m | Avg: 33m 16s | Max:  1h 07m | Hits:  27%/3704  
  🟩 20                 Pass: 100%/23  | Total:  8h 24m | Avg: 21m 55s | Max:  1h 21m | Hits:  63%/3704

🟩 cudax: Pass: 100%/26 | Total: 2h 13m | Avg: 5m 08s | Max: 20m 40s | Hits: 92%/312

🟩 cpu
  🟩 amd64              Pass: 100%/22  | Total:  2h 03m | Avg:  5m 36s | Max: 20m 40s | Hits:  92%/312   
  🟩 arm64              Pass: 100%/4   | Total: 10m 27s | Avg:  2m 36s | Max:  2m 44s
🟩 ctk
  🟩 12.0               Pass: 100%/3   | Total: 16m 28s | Avg:  5m 29s | Max:  9m 46s | Hits:  92%/156   
  🟩 12.5               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 56s
  🟩 12.6               Pass: 100%/21  | Total:  1h 45m | Avg:  5m 02s | Max: 20m 40s | Hits:  92%/156   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/3   | Total: 16m 28s | Avg:  5m 29s | Max:  9m 46s | Hits:  92%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 56s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 45m | Avg:  5m 02s | Max: 20m 40s | Hits:  92%/156   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/26  | Total:  2h 13m | Avg:  5m 08s | Max: 20m 40s | Hits:  92%/312   
🟩 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  3m 45s | Avg:  3m 45s | Max:  3m 45s
  🟩 Clang10            Pass: 100%/1   | Total:  4m 15s | Avg:  4m 15s | Max:  4m 15s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
  🟩 Clang12            Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
  🟩 Clang13            Pass: 100%/1   | Total:  3m 33s | Avg:  3m 33s | Max:  3m 33s
  🟩 Clang14            Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 10s | Avg:  3m 10s | Max:  3m 10s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 26s | Avg:  3m 26s | Max:  3m 26s
  🟩 Clang18            Pass: 100%/4   | Total: 27m 25s | Avg:  6m 51s | Max: 18m 50s
  🟩 GCC9               Pass: 100%/1   | Total:  2m 57s | Avg:  2m 57s | Max:  2m 57s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 09s | Avg:  3m 09s | Max:  3m 09s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 02s | Avg:  3m 02s | Max:  3m 02s
  🟩 GCC12              Pass: 100%/2   | Total: 23m 55s | Avg: 11m 57s | Max: 20m 40s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 51s | Avg:  2m 42s | Max:  2m 57s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 46s | Avg:  9m 46s | Max:  9m 46s | Hits:  92%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 49s | Avg:  8m 49s | Max:  8m 49s | Hits:  92%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 56s
🟩 cxx_family
  🟩 Clang              Pass: 100%/13  | Total: 59m 42s | Avg:  4m 35s | Max: 18m 50s
  🟩 GCC                Pass: 100%/9   | Total: 43m 54s | Avg:  4m 52s | Max: 20m 40s
  🟩 MSVC               Pass: 100%/2   | Total: 18m 35s | Avg:  9m 17s | Max:  9m 46s | Hits:  92%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 56s
🟩 gpu
  🟩 v100               Pass: 100%/26  | Total:  2h 13m | Avg:  5m 08s | Max: 20m 40s | Hits:  92%/312   
🟩 jobs
  🟩 Build              Pass: 100%/24  | Total:  1h 34m | Avg:  3m 55s | Max:  9m 46s | Hits:  92%/312   
  🟩 Test               Pass: 100%/2   | Total: 39m 30s | Avg: 19m 45s | Max: 20m 40s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 43s | Avg:  2m 43s | Max:  2m 43s
  🟩 90a                Pass: 100%/1   | Total:  2m 57s | Avg:  2m 57s | Max:  2m 57s
🟩 std
  🟩 17                 Pass: 100%/6   | Total: 20m 29s | Avg:  3m 24s | Max:  5m 56s
  🟩 20                 Pass: 100%/20  | Total:  1h 53m | Avg:  5m 39s | Max: 20m 40s | Hits:  92%/312

🟩 cccl: Pass: 100%/6 | Total: 28m 32s | Avg: 4m 45s | Max: 5m 24s

🟩 cpu
  🟩 amd64              Pass: 100%/6   | Total: 28m 32s | Avg:  4m 45s | Max:  5m 24s
🟩 ctk
  🟩 11.1               Pass: 100%/2   | Total:  7m 56s | Avg:  3m 58s | Max:  4m 13s
  🟩 12.0               Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  5m 16s
  🟩 12.6               Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 24s
🟩 cudacxx
  🟩 nvcc11.1           Pass: 100%/2   | Total:  7m 56s | Avg:  3m 58s | Max:  4m 13s
  🟩 nvcc12.0           Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  5m 16s
  🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 24s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/6   | Total: 28m 32s | Avg:  4m 45s | Max:  5m 24s
🟩 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
  🟩 Clang18            Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s
  🟩 GCC6               Pass: 100%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 34s | Avg:  4m 34s | Max:  4m 34s
  🟩 GCC13              Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
🟩 cxx_family
  🟩 Clang              Pass: 100%/3   | Total: 14m 53s | Avg:  4m 57s | Max:  5m 24s
  🟩 GCC                Pass: 100%/3   | Total: 13m 39s | Avg:  4m 33s | Max:  5m 22s
🟩 gpu
  🟩 v100               Pass: 100%/6   | Total: 28m 32s | Avg:  4m 45s | Max:  5m 24s
🟩 jobs
  🟩 Infra              Pass: 100%/6   | Total: 28m 32s | Avg:  4m 45s | Max:  5m 24s

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 35s | Avg: 5m 17s | Max: 8m 30s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 30s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
  🟩 Test               Pass: 100%/1   | Total:  8m 30s | Avg:  8m 30s | Max:  8m 30s

🟩 python: Pass: 100%/1 | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s

👃 Inspect Changes

Modifications in project?

	Project
+/-	CCCL Infrastructure
+/-	libcu++
+/-	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
+/-	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 176)

#	Runner
125	`linux-amd64-cpu16`
25	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

@shwina

Recently, I added support for `codespell` in CCCL (NVIDIA/cccl#3168). @shwina noticed some issues in my PR that were fixed in NVIDIA/cccl#3182. This PR ports similar fixes to RMM, to make `codespell` work better when run both inside and outside of `pre-commit`. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #1769

implement `add_sat` split `signed`/`unsigned` implementation, improve implementation for MSVC improve device `add_sat` implementation add `add_sat` test improve generic `add_sat` implementation for signed types implement `sub_sat` allow more msvc intrinsics on x86 add op tests partially implement `mul_sat` implement `div_sat` and `saturate_cast` add `saturate_cast` test simplify `div_sat` test Deprectate C++11 and C++14 for libcu++ (#3173) * Deprectate C++11 and C++14 for libcu++ Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Implement `abs` and `div` from `cstdlib` (#3153) * implement integer abs functions * improve tests, fix constexpr support * just use the our implementation * implement `cuda::std::div` * prefer host's `div_t` like types * provide `cuda::std::abs` overloads for floats * allow fp abs for NVRTC * silence msvc's warning about conversion from floating point to integral Fix missing radix sort policies (#3174) Fixes NVBug 5009941 Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148) * introduces new arg{min,max} interface with two output iterators * adds fp inf tests * fixes docs * improves code example * fixes exec space specifier * trying to fix deprecation warning for more compilers * inlines unzip operator * trying to fix deprecation warning for nvhpc * integrates supression fixes in diagnostics * pre-ctk 11.5 deprecation suppression * fixes icc * fix for pre-ctk11.5 * cleans up deprecation suppression * cleanup Extend tuning documentation (#3179) Add codespell pre-commit hook, fix typos in CCCL (#3168) * Add codespell pre-commit hook * Automatic changes from codespell. * Manual changes. Fix parameter space for TUNE_LOAD in scan benchmark (#3176) fix various old compiler checks (#3178) implement C++26 `std::projected` (#3175) Fix pre-commit config for codespell and remaining typos (#3182) Massive cleanup of our config (#3155) Fix UB in atomics with automatic storage (#2586) * Adds specialized local cuda atomics and injects them into most atomics paths. Co-authored-by: Georgy Evtushenko <evtushenko.georgy@gmail.com> Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com> * Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478 * Remove extraneous double brackets in unformatted code. * Merge unsafe atomic logic into `__cuda_is_local`. * Use `const_cast` for type conversions in cuda_local.h * Fix build issues from interface changes * Fix missing __nanosleep on sm70- * Guard __isLocal from NVHPC * Use PTX instead of running nothing from NVHPC * fixup /s/nvrtc/nvhpc * Fixup missing CUDA ifdef surrounding device code * Fix codegen * Bypass some sort of compiler bug on GCC7 * Apply suggestions from code review * Use unsafe automatic storage atomics in codegen tests --------- Co-authored-by: Georgy Evtushenko <evtushenko.georgy@gmail.com> Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Refactor the source code layout for `cuda.parallel` (#3177) * Refactor the source layout for cuda.parallel * Add copyright * Address review feedback * Don't import anything into `experimental` namespace * fix import --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> new type-erased memory resources (#2824) s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186) Document address stability of `thrust::transform` (#3181) * Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS * Reformat and fix UnaryFunction/BinaryFunction in transform docs * Mention transform can use proclaim_copyable_arguments * Document cuda::proclaims_copyable_arguments better * Deprecate depending on transform functor argument addresses Fixes: #3053 turn off cuda version check for clangd (#3194) [STF] jacobi example based on parallel_for (#3187) * Simple jacobi example with parallel for and reductions * clang-format * remove useless capture list fixes pre-nv_diag suppression issues (#3189) Prefer c2h::type_name over c2h::demangle (#3195) Fix memcpy_async* tests (#3197) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test Add type annotations and mypy checks for `cuda.parallel` (#3180) * Refactor the source layout for cuda.parallel * Add initial type annotations * Update pre-commit config * More typing * Fix bad merge * Fix TYPE_CHECKING and numpy annotations * typing bindings.py correctly * Address review feedback --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Fix rendering of cuda.parallel docs (#3192) * Fix pre-commit config for codespell and remaining typos * Fix rendering of docs for cuda.parallel --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Enable PDL for DeviceMergeSortBlockSortKernel (#3199) The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC. This commit enables PDL when launching the kernel. Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647) * adds benchmarks for reduce::arg{min,max} * preliminary streaming arg-extremum reduction * fixes implicit conversion * uses streaming dispatch class * changes arg benches to use new streaming reduce * streaming arg-extrema reduction * fixes style * fixes compilation failures * cleanups * adds rst style comments * declare vars const and use clamp * consolidates argmin argmax benchmarks * fixes thrust usage * drops offset type in arg-extrema benchmarks * fixes clang cuda * exec space macros * switch to signed global offset type for slightly better perf * clarifies documentation * applies minor benchmark style changes from review comments * fixes interface documentation and comments * list-init accumulating output op * improves style, comments, and tests * cleans up aggregate init * renames dispatch class usage in benchmarks * fixes merge conflicts * addresses review comments * addresses review comments * fixes assertion * removes superseded implementation * changes large problem tests to use new interface * removes obsolete tests for deprecated interface Fixes for Python 3.7 docs environment (#3206) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Adds support for large number of items to `DeviceTransform` (#3172) * moves large problem test helper to common file * adds support for large num items to device transform * adds tests for large number of items to device interface * fixes format * addresses review comments cp_async_bulk: Fix test (#3198) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test * cp_async_bulk: Fix test The global memory pointer could be misaligned. cudax fixes for msvc 14.41 (#3200) avoid instantiating class templates in `is_same` implementation when possible (#3203) Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209) * Fix: make launchers a CUB detail; make kernel source functions hidden. * [pre-commit.ci] auto code formatting * Address review comments, fix which macro gets fixed. help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202) unify macros and cmake options that control the suppression of deprecation warnings (#3220) * unify macros and cmake options that control the suppression of deprecation warnings * suppress nvcc warning #186 in thrust header tests * suppress c++ dialect deprecation warnings in libcudacxx header tests Fx thread-reduce performance regression (#3225) cuda.parallel: In-memory caching of build objects (#3216) * Define __eq__ and __hash__ for Iterators * Define cache_with_key utility and use it to cache Reduce objects * Add tests for caching Reduce objects * Tighten up types * Updates to support 3.7 * Address review feedback * Introduce IteratorKind to hold iterator type information * Use the .kind to generate an abi_name * Remove __eq__ and __hash__ methods from IteratorBase * Move helper function * Formatting * Don't unpack tuple in cache key --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Just enough ranges for c++14 `span` (#3211) use generalized concepts portability macros to simplify the `range` concept (#3217) fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR` Use Ruff to sort imports (#3230) * Update pyproject.tomls for import sorting * Update files after running pre-commit * Move ruff config to pyproject.toml --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> fix tuning_scan sm90 config issue (#3236) Co-authored-by: Shijie Chen <shijiec@nvidia.com> [STF] Logical token (#3196) * Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs. * Add missing files * Check if a task implementation can match a prototype where the void_interface arguments are ignored * Implement ctx.abstract_logical_data() which relies on a void data interface * Illustrate how to use abstract handles in local contexts * Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages * Small improvements in the examples * Do not try to allocate or move void data * Do not use I as a variable * fix linkage error * rename abtract_logical_data into logical_token * Document logical token * fix spelling error * fix sphinx error * reflect name changes * use meaningful variable names * simplify logical_token implementation because writeback is already disabled * add a unit test for token elision * implement token elision in host_launch * Remove unused type * Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens * Much simpler is_tuple_invocable_with_filtered implementation * Fix buggy test * Factorize code * Document that we can ignore tokens for task and host_launch * Documentation for logical data freeze Fix ReduceByKey tuning (#3240) Fix RLE tuning (#3239) cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233) * Forbid non-contiguous arrays as inputs (or outputs) * Implement a more robust way to check for contiguity * Don't bother if cublas unavailable * Fix how we check for zero-element arrays * sort imports --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> expands support for more offset types in segmented benchmark (#3231) Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253) * Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects * Do not add option twice ptx: Add add_instruction.py (#3190) This file helps create the necessary structure for new PTX instructions. Co-authored-by: Allard Hendriksen <ahendriksen@nvidia.com> Bump main to 2.9.0. (#3247) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop cub::Mutex (#3251) Fixes: #3250 Remove legacy macros from CUB util_arch.cuh (#3257) Fixes: #3256 Remove thrust::[unary|binary]_traits (#3260) Fixes: #3259 Architecture and OS identification macros (#3237) Bump main to 3.0.0. (#3265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop thrust not1 and not2 (#3264) Fixes: #3263 CCCL Internal macro documentation (#3238) Deprecate GridBarrier and GridBarrierLifetime (#3258) Fixes: #1389 Require at least gcc7 (#3268) Fixes: #3267 Drop thrust::[unary|binary]_function (#3274) Fixes: #3273 Drop ICC from CI (#3277) [STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270) * Add a test to reproduce a bug observed with parallel_for on a host place * clang-format * use _CCCL_ASSERT * Attempt to debug * do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead * fix lambda expression * clang-format Enable thrust::identity test for non-MSVC (#3281) This seems to be an oversight when the test was added Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Enable PDL in triple chevron launch (#3282) It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed to _CCCL_HAS_PDL during the review introducing the feature. Disambiguate line continuations and macro continuations in <nv/target> (#3244) Drop VS 2017 from CI (#3287) Fixes: #3286 Drop ICC support in code (#3279) * Drop ICC from code Fixes: #3278 Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Make CUB NVRTC commandline arguments come from a cmake template (#3292) Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295) Use process isolation instead of default hyper-v for Windows. (#3294) Try improving build times by using process isolation instead of hyper-v Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> [pre-commit.ci] pre-commit autoupdate (#3248) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Drop Thrust legacy arch macros (#3298) Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS Drop Thrust's compiler_fence.h (#3300) Drop CTK 11.x from CI (#3275) * Add cuda12.0-gcc7 devcontainer * Move MSVC2017 jobs to CTK 12.6 Those is the only combination where rapidsai has devcontainers * Add /Zc:__cplusplus for the libcudacxx tests * Only add excape hatch for affected CTKs * Workaround missing cudaLaunchKernelEx on MSVC cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK. * Workaround nvcc+MSVC issue * Regenerate devcontainers Fixes: #3249 Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Drop CUB's util_compiler.cuh (#3302) All contained macros were deprecated Update packman and repo_docs versions (#3293) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Drop Thrust's deprecated compiler macros (#3301) Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305) Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506) * adds support for large number of items to three-way partition * adapts interface to use choose_signed_offset_t * integrates applicable feedback from device-select pr * changes behavior for empty problems * unifies grid constant macro * fixes kernel template specialization mismatch * integrates _CCCL_GRID_CONSTANT changes * resolve merge conflicts * fixes checks in test * fixes test verification * improves tests * makes few improvements to streaming dispatch * improves code comment on test * fixes unrelated compiler error * minor style improvements Refactor scan tunings (#3262) Require C++17 for compiling Thrust and CUB (#3255) * Issue an unsuppressable warning when compiling with < C++17 * Remove C++11/14 presets * Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers * Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14] * Remove CUB_ENABLE_DIALECT_CPP[11|14] * Update CI runs * Remove C++11/14 CI runs for CUB and Thrust * Raise compiler minimum versions for C++17 * Update ReadMe * Drop Thrust's cpp14_required.h * Add escape hatch for C++17 removal Fixes: #3252 Implement `views::empty` (#3254) * Disable pair conversion of subrange with clang in C++17 * Fix namespace views * Implement `views::empty` This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view Refactor `limits` and `climits` (#3221) * implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311) * Add tests demonstrating usage of different iterators * Update documentation of reduce_into by merging import code snippet with the rest of the example * Add documentation for current iterators * Run pre-commit checks and update accordingly * Fix comments to refer to the proper lines in the code snippets in the docs Drop clang<14 from CI, update devcontainers. (#3309) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> [STF] Cleanup task dependencies object constructors (#3291) * Define tag types for access modes * - Rework how we build task_dep objects based on access mode tags - pack_state is now responsible for using a const_cast for read only data * Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums * It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes Disable test with a gcc-14 regression (#3297) Deprecate Thrust's cpp_compatibility.h macros (#3299) Remove dropped function objects from docs (#3319) Document `NV_TARGET` macros (#3313) [STF] Define ctx.pick_stream() which was missing for the unified context (#3326) * Define ctx.pick_stream() which was missing for the unified context * clang-format Deprecate cub::IterateThreadStore (#3337) Drop CUB's BinaryFlip operator (#3332) Deprecate cub::Swap (#3333) Clarify transform output can overlap input (#3323) Drop CUB APIs with a debug_synchronous parameter (#3330) Fixes: #3329 Drop CUB's util_compiler.cuh for real (#3340) PR #3302 planned to drop the file, but only dropped its content. This was an oversight. So let's drop the entire file. Drop cub::ValueCache (#3346) limits offset types for merge sort (#3328) Drop CDPv1 (#3344) Fixes: #3341 Drop thrust::void_t (#3362) Use cuda::std::addressof in Thrust (#3363) Fix all_of documentation for empty ranges (#3358) all_of always returns true on an empty range. [STF] Do not keep track of dangling events in a CUDA graph backend (#3327) * Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources. * replace can_ignore_dangling_events by track_dangling_events which leads to more readable code * When not storing the dangling events, we must still perform the deinit operations that were producing these events ! Extract scan kernels into NVRTC-compilable header (#3334) * Extract scan kernels into NVRTC-compilable header * Update cub/cub/device/dispatch/dispatch_scan.cuh Co-authored-by: Georgii Evtushenko <evtushenko.georgy@gmail.com> --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Co-authored-by: Georgii Evtushenko <evtushenko.georgy@gmail.com> Drop deprecated aliases in Thrust functional (#3272) Fixes: #3271 Drop cub::DivideAndRoundUp (#3347) Use cuda::std::min/max in Thrust (#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (#2773) Deprecate thrust::null_type (#3367) Deprecate cub::DeviceSpmv (#3320) Fixes: #896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (#3385) cuda.parallel: Support structured types as algorithm inputs (#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Deprecate thrust::async (#3324) Fixes: #100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (#3366) Replace `typedef` with `using` in libcu++ (#3368) Deprecate thrust::optional (#3307) Fixes: #3306 Upgrade to Catch2 3.8 (#3310) Fixes: #1724 refactor `<cuda/std/cstdint>` (#3325) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Update CODEOWNERS (#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (#3408) Implement more cmath functions to be usable on host and device (#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Fix assert definition for NVHPC due to constexpr issues (#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes #3411 Extend CUB reduce benchmarks (#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: #3283 Update upload-pages-artifact to v3 (#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Replace and deprecate thrust::cuda_cub::terminate (#3421) `std::linalg` accessors and `transposed_layout` (#2962) Add round up/down to multiple (#3234) [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178 * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996 * Install CCCL headers under cuda.cccl.include Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562 Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2. Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971 * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d. * Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> cuda.parallel: Add optional stream argument to reduce_into() (#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes #3404 move to c++17, finalize device optimization fix msvc compilation, update tests Deprectate C++11 and C++14 for libcu++ (#3173) * Deprectate C++11 and C++14 for libcu++ Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Implement `abs` and `div` from `cstdlib` (#3153) * implement integer abs functions * improve tests, fix constexpr support * just use the our implementation * implement `cuda::std::div` * prefer host's `div_t` like types * provide `cuda::std::abs` overloads for floats * allow fp abs for NVRTC * silence msvc's warning about conversion from floating point to integral Fix missing radix sort policies (#3174) Fixes NVBug 5009941 Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148) * introduces new arg{min,max} interface with two output iterators * adds fp inf tests * fixes docs * improves code example * fixes exec space specifier * trying to fix deprecation warning for more compilers * inlines unzip operator * trying to fix deprecation warning for nvhpc * integrates supression fixes in diagnostics * pre-ctk 11.5 deprecation suppression * fixes icc * fix for pre-ctk11.5 * cleans up deprecation suppression * cleanup Extend tuning documentation (#3179) Add codespell pre-commit hook, fix typos in CCCL (#3168) * Add codespell pre-commit hook * Automatic changes from codespell. * Manual changes. Fix parameter space for TUNE_LOAD in scan benchmark (#3176) fix various old compiler checks (#3178) implement C++26 `std::projected` (#3175) Fix pre-commit config for codespell and remaining typos (#3182) Massive cleanup of our config (#3155) Fix UB in atomics with automatic storage (#2586) * Adds specialized local cuda atomics and injects them into most atomics paths. Co-authored-by: Georgy Evtushenko <evtushenko.georgy@gmail.com> Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com> * Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478 * Remove extraneous double brackets in unformatted code. * Merge unsafe atomic logic into `__cuda_is_local`. * Use `const_cast` for type conversions in cuda_local.h * Fix build issues from interface changes * Fix missing __nanosleep on sm70- * Guard __isLocal from NVHPC * Use PTX instead of running nothing from NVHPC * fixup /s/nvrtc/nvhpc * Fixup missing CUDA ifdef surrounding device code * Fix codegen * Bypass some sort of compiler bug on GCC7 * Apply suggestions from code review * Use unsafe automatic storage atomics in codegen tests --------- Co-authored-by: Georgy Evtushenko <evtushenko.georgy@gmail.com> Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Refactor the source code layout for `cuda.parallel` (#3177) * Refactor the source layout for cuda.parallel * Add copyright * Address review feedback * Don't import anything into `experimental` namespace * fix import --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> new type-erased memory resources (#2824) s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186) Document address stability of `thrust::transform` (#3181) * Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS * Reformat and fix UnaryFunction/BinaryFunction in transform docs * Mention transform can use proclaim_copyable_arguments * Document cuda::proclaims_copyable_arguments better * Deprecate depending on transform functor argument addresses Fixes: #3053 turn off cuda version check for clangd (#3194) [STF] jacobi example based on parallel_for (#3187) * Simple jacobi example with parallel for and reductions * clang-format * remove useless capture list fixes pre-nv_diag suppression issues (#3189) Prefer c2h::type_name over c2h::demangle (#3195) Fix memcpy_async* tests (#3197) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test Add type annotations and mypy checks for `cuda.parallel` (#3180) * Refactor the source layout for cuda.parallel * Add initial type annotations * Update pre-commit config * More typing * Fix bad merge * Fix TYPE_CHECKING and numpy annotations * typing bindings.py correctly * Address review feedback --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Fix rendering of cuda.parallel docs (#3192) * Fix pre-commit config for codespell and remaining typos * Fix rendering of docs for cuda.parallel --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Enable PDL for DeviceMergeSortBlockSortKernel (#3199) The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC. This commit enables PDL when launching the kernel. Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647) * adds benchmarks for reduce::arg{min,max} * preliminary streaming arg-extremum reduction * fixes implicit conversion * uses streaming dispatch class * changes arg benches to use new streaming reduce * streaming arg-extrema reduction * fixes style * fixes compilation failures * cleanups * adds rst style comments * declare vars const and use clamp * consolidates argmin argmax benchmarks * fixes thrust usage * drops offset type in arg-extrema benchmarks * fixes clang cuda * exec space macros * switch to signed global offset type for slightly better perf * clarifies documentation * applies minor benchmark style changes from review comments * fixes interface documentation and comments * list-init accumulating output op * improves style, comments, and tests * cleans up aggregate init * renames dispatch class usage in benchmarks * fixes merge conflicts * addresses review comments * addresses review comments * fixes assertion * removes superseded implementation * changes large problem tests to use new interface * removes obsolete tests for deprecated interface Fixes for Python 3.7 docs environment (#3206) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Adds support for large number of items to `DeviceTransform` (#3172) * moves large problem test helper to common file * adds support for large num items to device transform * adds tests for large number of items to device interface * fixes format * addresses review comments cp_async_bulk: Fix test (#3198) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test * cp_async_bulk: Fix test The global memory pointer could be misaligned. cudax fixes for msvc 14.41 (#3200) avoid instantiating class templates in `is_same` implementation when possible (#3203) Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209) * Fix: make launchers a CUB detail; make kernel source functions hidden. * [pre-commit.ci] auto code formatting * Address review comments, fix which macro gets fixed. help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202) unify macros and cmake options that control the suppression of deprecation warnings (#3220) * unify macros and cmake options that control the suppression of deprecation warnings * suppress nvcc warning #186 in thrust header tests * suppress c++ dialect deprecation warnings in libcudacxx header tests Fx thread-reduce performance regression (#3225) cuda.parallel: In-memory caching of build objects (#3216) * Define __eq__ and __hash__ for Iterators * Define cache_with_key utility and use it to cache Reduce objects * Add tests for caching Reduce objects * Tighten up types * Updates to support 3.7 * Address review feedback * Introduce IteratorKind to hold iterator type information * Use the .kind to generate an abi_name * Remove __eq__ and __hash__ methods from IteratorBase * Move helper function * Formatting * Don't unpack tuple in cache key --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Just enough ranges for c++14 `span` (#3211) use generalized concepts portability macros to simplify the `range` concept (#3217) fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR` Use Ruff to sort imports (#3230) * Update pyproject.tomls for import sorting * Update files after running pre-commit * Move ruff config to pyproject.toml --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> fix tuning_scan sm90 config issue (#3236) Co-authored-by: Shijie Chen <shijiec@nvidia.com> [STF] Logical token (#3196) * Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs. * Add missing files * Check if a task implementation can match a prototype where the void_interface arguments are ignored * Implement ctx.abstract_logical_data() which relies on a void data interface * Illustrate how to use abstract handles in local contexts * Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages * Small improvements in the examples * Do not try to allocate or move void data * Do not use I as a variable * fix linkage error * rename abtract_logical_data into logical_token * Document logical token * fix spelling error * fix sphinx error * reflect name changes * use meaningful variable names * simplify logical_token implementation because writeback is already disabled * add a unit test for token elision * implement token elision in host_launch * Remove unused type * Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens * Much simpler is_tuple_invocable_with_filtered implementation * Fix buggy test * Factorize code * Document that we can ignore tokens for task and host_launch * Documentation for logical data freeze Fix ReduceByKey tuning (#3240) Fix RLE tuning (#3239) cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233) * Forbid non-contiguous arrays as inputs (or outputs) * Implement a more robust way to check for contiguity * Don't bother if cublas unavailable * Fix how we check for zero-element arrays * sort imports --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> expands support for more offset types in segmented benchmark (#3231) Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253) * Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects * Do not add option twice ptx: Add add_instruction.py (#3190) This file helps create the necessary structure for new PTX instructions. Co-authored-by: Allard Hendriksen <ahendriksen@nvidia.com> Bump main to 2.9.0. (#3247) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop cub::Mutex (#3251) Fixes: #3250 Remove legacy macros from CUB util_arch.cuh (#3257) Fixes: #3256 Remove thrust::[unary|binary]_traits (#3260) Fixes: #3259 Architecture and OS identification macros (#3237) Bump main to 3.0.0. (#3265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop thrust not1 and not2 (#3264) Fixes: #3263 CCCL Internal macro documentation (#3238) Deprecate GridBarrier and GridBarrierLifetime (#3258) Fixes: #1389 Require at least gcc7 (#3268) Fixes: #3267 Drop thrust::[unary|binary]_function (#3274) Fixes: #3273 Drop ICC from CI (#3277) [STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270) * Add a test to reproduce a bug observed with parallel_for on a host place * clang-format * use _CCCL_ASSERT * Attempt to debug * do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead * fix lambda expression * clang-format Enable thrust::identity test for non-MSVC (#3281) This seems to be an oversight when the test was added Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Enable PDL in triple chevron launch (#3282) It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed to _CCCL_HAS_PDL during the review introducing the feature. Disambiguate line continuations and macro continuations in <nv/target> (#3244) Drop VS 2017 from CI (#3287) Fixes: #3286 Drop ICC support in code (#3279) * Drop ICC from code Fixes: #3278 Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Make CUB NVRTC commandline arguments come from a cmake template (#3292) Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295) Use process isolation instead of default hyper-v for Windows. (#3294) Try improving build times by using process isolation instead of hyper-v Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> [pre-commit.ci] pre-commit autoupdate (#3248) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Drop Thrust legacy arch macros (#3298) Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS Drop Thrust's compiler_fence.h (#3300) Drop CTK 11.x from CI (#3275) * Add cuda12.0-gcc7 devcontainer * Move MSVC2017 jobs to CTK 12.6 Those is the only combination where rapidsai has devcontainers * Add /Zc:__cplusplus for the libcudacxx tests * Only add excape hatch for affected CTKs * Workaround missing cudaLaunchKernelEx on MSVC cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK. * Workaround nvcc+MSVC issue * Regenerate devcontainers Fixes: #3249 Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Update packman and repo_docs versions (#3293) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Drop Thrust's deprecated compiler macros (#3301) Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305) Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506) * adds support for large number of items to three-way partition * adapts interface to use choose_signed_offset_t * integrates applicable feedback from device-select pr * changes behavior for empty problems * unifies grid constant macro * fixes kernel template specialization mismatch * integrates _CCCL_GRID_CONSTANT changes * resolve merge conflicts * fixes checks in test * fixes test verification * improves tests * makes few improvements to streaming dispatch * improves code comment on test * fixes unrelated compiler error * minor style improvements Refactor scan tunings (#3262) Require C++17 for compiling Thrust and CUB (#3255) * Issue an unsuppressable warning when compiling with < C++17 * Remove C++11/14 presets * Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers * Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14] * Remove CUB_ENABLE_DIALECT_CPP[11|14] * Update CI runs * Remove C++11/14 CI runs for CUB and Thrust * Raise compiler minimum versions for C++17 * Update ReadMe * Drop Thrust's cpp14_required.h * Add escape hatch for C++17 removal Fixes: #3252 Implement `views::empty` (#3254) * Disable pair conversion of subrange with clang in C++17 * Fix namespace views * Implement `views::empty` This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view Refactor `limits` and `climits` (#3221) * implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311) * Add tests demonstrating usage of different iterators * Update documentation of reduce_into by merging import code snippet with the rest of the example * Add documentation for current iterators * Run pre-commit checks and update accordingly * Fix comments to refer to the proper lines in the code snippets in the docs Drop clang<14 from CI, update devcontainers. (#3309) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> [STF] Cleanup task dependencies object constructors (#3291) * Define tag types for access modes * - Rework how we build task_dep objects based on access mode tags - pack_state is now responsible for using a const_cast for read only data * Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums * It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes Disable test with a gcc-14 regression (#3297) Deprecate Thrust's cpp_compatibility.h macros (#3299) Remove dropped function objects from docs (#3319) Document `NV_TARGET` macros (#3313) [STF] Define ctx.pick_stream() which was missing for the unified context (#3326) * Define ctx.pick_stream() which was missing for the unified context * clang-format Deprecate cub::IterateThreadStore (#3337) Drop CUB's BinaryFlip operator (#3332) Deprecate cub::Swap (#3333) Clarify transform output can overlap input (#3323) Drop CUB APIs with a debug_synchronous parameter (#3330) Fixes: #3329 Drop CUB's util_compiler.cuh for real (#3340) PR #3302 planned to drop the file, but only dropped its content. This was an oversight. So let's drop the entire file. Drop cub::ValueCache (#3346) limits offset types for merge sort (#3328) Drop CDPv1 (#3344) Fixes: #3341 Drop thrust::void_t (#3362) Use cuda::std::addressof in Thrust (#3363) Fix all_of documentation for empty ranges (#3358) all_of always returns true on an empty range. [STF] Do not keep track of dangling events in a CUDA graph backend (#3327) * Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources. * replace can_ignore_dangling_events by track_dangling_events which leads to more readable code * When not storing the dangling events, we must still perform the deinit operations that were producing these events ! Extract scan kernels into NVRTC-compilable header (#3334) * Extract scan kernels into NVRTC-compilable header * Update cub/cub/device/dispatch/dispatch_scan.cuh Co-authored-by: Georgii Evtushenko <evtushenko.georgy@gmail.com> --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Co-authored-by: Georgii Evtushenko <evtushenko.georgy@gmail.com> Drop deprecated aliases in Thrust functional (#3272) Fixes: #3271 Drop cub::DivideAndRoundUp (#3347) Use cuda::std::min/max in Thrust (#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (#2773) Deprecate thrust::null_type (#3367) Deprecate cub::DeviceSpmv (#3320) Fixes: #896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (#3385) cuda.parallel: Support structured types as algorithm inputs (#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Deprecate thrust::async (#3324) Fixes: #100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (#3366) Replace `typedef` with `using` in libcu++ (#3368) Deprecate thrust::optional (#3307) Fixes: #3306 Upgrade to Catch2 3.8 (#3310) Fixes: #1724 refactor `<cuda/std/cstdint>` (#3325) Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> Update CODEOWNERS (#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (#3408) Implement more cmath functions to be usable on host and device (#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Fix assert definition for NVHPC due to constexpr issues (#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes #3411 Extend CUB reduce benchmarks (#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: #3283 Update upload-pages-artifact to v3 (#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Replace and deprecate thrust::cuda_cub::terminate (#3421) `std::linalg` accessors and `transposed_layout` (#2962) Add round up/down to multiple (#3234) [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178 * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996 * Install CCCL headers under cuda.cccl.include Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562 Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2. Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971 * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d. * Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com> cuda.parallel: Add optional stream argument to reduce_into() (#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes #3404 Fix CI issues (#3443) update docs fix review restrict allowed types replace constexpr implementations with generic optimize `__is_arithmetic_integral`

Fix pre-commit config for codespell and remaining typos

f422bbb

shwina requested review from a team as code owners December 16, 2024 16:57

shwina requested review from griwes, miscco and bernhardmgruber December 16, 2024 16:57

miscco approved these changes Dec 16, 2024

View reviewed changes

Don't actually need to include OffsetT

4fd6204

miscco enabled auto-merge (squash) December 16, 2024 17:06

bernhardmgruber approved these changes Dec 16, 2024

View reviewed changes

bdice approved these changes Dec 16, 2024

View reviewed changes

miscco merged commit eaa8edc into NVIDIA:main Dec 16, 2024
192 checks passed

bdice mentioned this pull request Dec 16, 2024

Fix codespell behavior. rapidsai/rmm#1769

Merged

3 tasks

davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 18, 2025

Fix pre-commit config for codespell and remaining typos (NVIDIA#3182)

a781d25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pre-commit config for codespell and remaining typos #3182

Fix pre-commit config for codespell and remaining typos #3182

shwina commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

🟩 libcudacxx: Pass: 100%/48 | Total: 17h 02m | Avg: 21m 18s | Max: 1h 05m | Hits: 31%/9806

🟩 cub: Pass: 100%/47 | Total: 1d 08h | Avg: 40m 51s | Max: 1h 12m | Hits: 30%/3124

🟩 thrust: Pass: 100%/46 | Total: 19h 08m | Avg: 24m 58s | Max: 1h 21m | Hits: 41%/9260

🟩 cudax: Pass: 100%/26 | Total: 2h 13m | Avg: 5m 08s | Max: 20m 40s | Hits: 92%/312

🟩 cccl: Pass: 100%/6 | Total: 28m 32s | Avg: 4m 45s | Max: 5m 24s

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 35s | Avg: 5m 17s | Max: 8m 30s

🟩 python: Pass: 100%/1 | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 176)

Fix pre-commit config for codespell and remaining typos #3182

Fix pre-commit config for codespell and remaining typos #3182

Conversation

shwina commented Dec 16, 2024

Description

Checklist

github-actions bot commented Dec 16, 2024

🟩 libcudacxx: Pass: 100%/48 | Total: 17h 02m | Avg: 21m 18s | Max: 1h 05m | Hits: 31%/9806

🟩 cub: Pass: 100%/47 | Total: 1d 08h | Avg: 40m 51s | Max: 1h 12m | Hits: 30%/3124

🟩 thrust: Pass: 100%/46 | Total: 19h 08m | Avg: 24m 58s | Max: 1h 21m | Hits: 41%/9260

🟩 cudax: Pass: 100%/26 | Total: 2h 13m | Avg: 5m 08s | Max: 20m 40s | Hits: 92%/312

🟩 cccl: Pass: 100%/6 | Total: 28m 32s | Avg: 4m 45s | Max: 5m 24s

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 35s | Avg: 5m 17s | Max: 8m 30s

🟩 python: Pass: 100%/1 | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 176)