Modernize Thrust examples #5670

charan-003 · 2025-08-27T02:59:59Z

Remove legacy include/host_device.h headers
Replace manual element assignment with std::initializer_list
Use range-based for loops where appropriate
Apply Thrust algorithms (Thrust::generate) with lambdas
Use .size() instead of hardcoded array sizes
Improve semantic naming and inline usage
Maintain compatibility with current CUDA/Thrust version
Modernized to use thrust::generate, cuda::std::distance, thrust::sequence

- Remove legacy include/host_device.h headers from 40 example files - Replace manual element assignment with std::initializer_list - Use range-based for loops where appropriate - Apply STL algorithms (std::generate) with lambdas - Use .size() instead of hardcoded array sizes - Improve semantic naming and inline usage - Maintain compatibility with current CUDA/Thrust version - Avoid thrust::enumerate (not available in current version) Files modernized: arbitrary_transformation.cu, basic_vector.cu, bounding_box.cu, bucket_sort2d.cu, constant_iterator.cu, counting_iterator.cu, device_ptr.cu, discrete_voronoi.cu, dot_products_with_zip.cu, expand.cu, histogram.cu, lambda.cu, lexicographical_sort.cu, max_abs_diff.cu, minmax.cu, mode.cu, monte_carlo.cu, monte_carlo_disjoint_sequences.cu, norm.cu, padded_grid_reduction.cu, permutation_iterator.cu, raw_reference_cast.cu, remove_points2d.cu, repeated_range.cu, saxpy.cu, scan_matrix_by_rows.cu, simple_moving_average.cu, sort.cu, sorting_aos_vs_soa.cu, stream_compaction.cu, sum_rows.cu, summary_statistics.cu, summed_area_table.cu, tiled_range.cu, transform_input_output_iterator.cu, transform_iterator.cu, transform_output_iterator.cu, uninitialized_vector.cu, weld_vertices.cu, word_count.cu

copy-pr-bot · 2025-08-27T03:00:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

miscco · 2025-08-27T07:37:46Z

/ok to test 3b62cc1

miscco

Thanks a lot for improving the example. This is already much better, I have some concerns about using std:: algorithms, because they only run on host and will segfault with device memory

We need to use the equivalent thrust algorithms

thrust/examples/arbitrary_transformation.cu

thrust/examples/basic_vector.cu

thrust/examples/bounding_box.cu

thrust/examples/counting_iterator.cu

thrust/examples/dot_products_with_zip.cu

thrust/examples/max_abs_diff.cu

thrust/examples/sort.cu

github-actions · 2025-08-27T09:54:46Z

🟨 CI finished in 2h 13m: Pass: 64%/140 | Total: 1d 01h | Avg: 10m 58s | Max: 2h 10m | Hits: 99%/69320

🟥 thrust: Pass: 0%/50 | Total: 7h 53m | Avg: 9m 27s | Max: 33m 16s

🟥 cmake_options
  🟥 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:   0%/2   | Total:  7m 32s | Avg:  3m 46s | Max:  7m 32s
🟥 cpu
  🟥 amd64              Pass:   0%/48  | Total:  7h 41m | Avg:  9m 36s | Max: 33m 16s
  🟥 arm64              Pass:   0%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  7m 00s
🟥 ctk
  🟥 12.0               Pass:   0%/5   | Total: 53m 45s | Avg: 10m 45s | Max: 29m 32s
  🟥 12.9               Pass:   0%/45  | Total:  6h 59m | Avg:  9m 19s | Max: 33m 16s
🟥 cudacxx
  🟥 ClangCUDA19        Pass:   0%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 46s
  🟥 nvcc12.0           Pass:   0%/5   | Total: 53m 45s | Avg: 10m 45s | Max: 29m 32s
  🟥 nvcc12.9           Pass:   0%/43  | Total:  6h 48m | Avg:  9m 29s | Max: 33m 16s
🟥 cudacxx_family
  🟥 ClangCUDA          Pass:   0%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 46s
  🟥 nvcc               Pass:   0%/48  | Total:  7h 42m | Avg:  9m 37s | Max: 33m 16s
🟥 cxx
  🟥 Clang14            Pass:   0%/4   | Total: 23m 08s | Avg:  5m 47s | Max:  6m 20s
  🟥 Clang15            Pass:   0%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  6m 17s
  🟥 Clang16            Pass:   0%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  5m 45s
  🟥 Clang17            Pass:   0%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  5m 48s
  🟥 Clang18            Pass:   0%/2   | Total: 11m 49s | Avg:  5m 54s | Max:  6m 09s
  🟥 Clang19            Pass:   0%/7   | Total: 27m 14s | Avg:  3m 53s | Max:  5m 46s
  🟥 GCC7               Pass:   0%/2   | Total: 13m 28s | Avg:  6m 44s | Max:  6m 46s
  🟥 GCC8               Pass:   0%/1   | Total:  7m 20s | Avg:  7m 20s | Max:  7m 20s
  🟥 GCC9               Pass:   0%/2   | Total: 14m 24s | Avg:  7m 12s | Max:  7m 39s
  🟥 GCC10              Pass:   0%/2   | Total: 14m 21s | Avg:  7m 10s | Max:  7m 17s
  🟥 GCC11              Pass:   0%/2   | Total: 15m 11s | Avg:  7m 35s | Max:  7m 36s
  🟥 GCC12              Pass:   0%/2   | Total: 16m 08s | Avg:  8m 04s | Max:  8m 13s
  🟥 GCC13              Pass:   0%/11  | Total: 50m 40s | Avg:  4m 36s | Max:  8m 21s
  🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 00m | Avg: 30m 07s | Max: 30m 42s
  🟥 MSVC14.43          Pass:   0%/5   | Total:  2h 01m | Avg: 24m 23s | Max: 32m 17s
  🟥 NVHPC25.5          Pass:   0%/2   | Total:  1h 02m | Avg: 31m 12s | Max: 33m 16s
🟥 cxx_family
  🟥 Clang              Pass:   0%/19  | Total:  1h 37m | Avg:  5m 06s | Max:  6m 20s
  🟥 GCC                Pass:   0%/22  | Total:  2h 11m | Avg:  5m 58s | Max:  8m 21s
  🟥 MSVC               Pass:   0%/7   | Total:  3h 02m | Avg: 26m 01s | Max: 32m 17s
  🟥 NVHPC              Pass:   0%/2   | Total:  1h 02m | Avg: 31m 12s | Max: 33m 16s
🟥 gpu
  🟥 h100               Pass:   0%/2   | Total:  6m 01s | Avg:  3m 00s | Max:  6m 01s
  🟥 rtx2080            Pass:   0%/38  | Total:  6h 53m | Avg: 10m 52s | Max: 33m 16s
  🟥 rtx4090            Pass:   0%/10  | Total: 53m 43s | Avg:  5m 22s | Max: 32m 17s
🟥 jobs
  🟥 Build              Pass:   0%/43  | Total:  7h 53m | Avg: 11m 00s | Max: 33m 16s
  🟥 TestCPU            Pass:   0%/3  
  🟥 TestGPU            Pass:   0%/4  
🟥 sm
  🟥 90                 Pass:   0%/2   | Total:  6m 01s | Avg:  3m 00s | Max:  6m 01s
  🟥 90;90a             Pass:   0%/2   | Total: 36m 24s | Avg: 18m 12s | Max: 29m 46s
  🟥 100;120            Pass:   0%/2   | Total: 34m 59s | Avg: 17m 29s | Max: 28m 00s
🟥 std
  🟥 17                 Pass:   0%/21  | Total:  3h 56m | Avg: 11m 16s | Max: 33m 16s
  🟥 20                 Pass:   0%/27  | Total:  3h 48m | Avg:  8m 28s | Max: 32m 17s

🟩 cub: Pass: 100%/50 | Total: 11h 10m | Avg: 13m 24s | Max: 38m 46s | Hits: 99%/53242

🟩 cpu
  🟩 amd64              Pass: 100%/48  | Total: 10h 56m | Avg: 13m 40s | Max: 38m 46s | Hits:  99%/50648 
  🟩 arm64              Pass: 100%/2   | Total: 14m 41s | Avg:  7m 20s | Max:  8m 42s | Hits:  99%/2594  
🟩 ctk
  🟩 12.0               Pass: 100%/5   | Total:  1h 04m | Avg: 12m 48s | Max: 35m 08s | Hits:  99%/6378  
  🟩 12.9               Pass: 100%/45  | Total: 10h 06m | Avg: 13m 28s | Max: 38m 46s | Hits:  99%/46864 
🟩 cudacxx
  🟩 ClangCUDA19        Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 14s | Hits:  99%/2233  
  🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 04m | Avg: 12m 48s | Max: 35m 08s | Hits:  99%/6378  
  🟩 nvcc12.9           Pass: 100%/43  | Total:  9h 56m | Avg: 13m 52s | Max: 38m 46s | Hits:  99%/44631 
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 14s | Hits:  99%/2233  
  🟩 nvcc               Pass: 100%/48  | Total: 11h 00m | Avg: 13m 45s | Max: 38m 46s | Hits:  99%/51009 
🟩 cxx
  🟩 Clang14            Pass: 100%/4   | Total: 26m 00s | Avg:  6m 30s | Max:  6m 49s | Hits:  99%/5190  
  🟩 Clang15            Pass: 100%/2   | Total: 13m 44s | Avg:  6m 52s | Max:  6m 53s | Hits:  99%/2591  
  🟩 Clang16            Pass: 100%/2   | Total: 13m 53s | Avg:  6m 56s | Max:  7m 12s | Hits:  99%/2591  
  🟩 Clang17            Pass: 100%/2   | Total: 14m 34s | Avg:  7m 17s | Max:  7m 23s | Hits:  99%/2591  
  🟩 Clang18            Pass: 100%/2   | Total: 14m 13s | Avg:  7m 06s | Max:  7m 07s | Hits:  99%/2591  
  🟩 Clang19            Pass: 100%/7   | Total:  1h 14m | Avg: 10m 38s | Max: 24m 08s | Hits:  99%/6120  
  🟩 GCC7               Pass: 100%/2   | Total: 16m 41s | Avg:  8m 20s | Max:  8m 35s | Hits:  99%/2594  
  🟩 GCC8               Pass: 100%/1   | Total:  8m 47s | Avg:  8m 47s | Max:  8m 47s | Hits:  99%/1297  
  🟩 GCC9               Pass: 100%/2   | Total: 17m 30s | Avg:  8m 45s | Max:  9m 04s | Hits:  99%/2594  
  🟩 GCC10              Pass: 100%/2   | Total: 18m 35s | Avg:  9m 17s | Max:  9m 27s | Hits:  99%/2595  
  🟩 GCC11              Pass: 100%/2   | Total: 18m 04s | Avg:  9m 02s | Max:  9m 05s | Hits:  99%/2591  
  🟩 GCC12              Pass: 100%/2   | Total: 19m 28s | Avg:  9m 44s | Max:  9m 52s | Hits:  99%/2591  
  🟩 GCC13              Pass: 100%/12  | Total:  3h 03m | Avg: 15m 16s | Max: 25m 20s | Hits:  99%/7785  
  🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 40s | Max: 35m 08s | Hits:  99%/2378  
  🟩 MSVC14.43          Pass: 100%/4   | Total:  2h 15m | Avg: 33m 46s | Max: 38m 46s | Hits:  99%/4756  
  🟩 NVHPC25.5          Pass: 100%/2   | Total: 27m 00s | Avg: 13m 30s | Max: 13m 55s | Hits:  98%/2387  
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 36m | Avg:  8m 15s | Max: 24m 08s | Hits:  99%/21674 
  🟩 GCC                Pass: 100%/23  | Total:  4h 42m | Avg: 12m 16s | Max: 25m 20s | Hits:  99%/22047 
  🟩 MSVC               Pass: 100%/6   | Total:  3h 24m | Avg: 34m 04s | Max: 38m 46s | Hits:  99%/7134  
  🟩 NVHPC              Pass: 100%/2   | Total: 27m 00s | Avg: 13m 30s | Max: 13m 55s | Hits:  98%/2387  
🟩 gpu
  🟩 h100               Pass: 100%/3   | Total: 56m 50s | Avg: 18m 56s | Max: 25m 20s | Hits:  99%/1298  
  🟩 rtx2080            Pass: 100%/39  | Total:  7h 51m | Avg: 12m 04s | Max: 38m 46s | Hits:  99%/49350 
  🟩 rtxa6000           Pass: 100%/8   | Total:  2h 22m | Avg: 17m 50s | Max: 24m 08s | Hits:  99%/2594  
🟩 jobs
  🟩 Build              Pass: 100%/42  | Total:  8h 15m | Avg: 11m 47s | Max: 38m 46s | Hits:  99%/53242 
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 07s | Avg: 22m 07s | Max: 22m 07s
  🟩 GraphCapture       Pass: 100%/1   | Total: 14m 49s | Avg: 14m 49s | Max: 14m 49s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 58s | Max: 24m 10s
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 06m | Avg: 22m 16s | Max: 25m 20s
🟩 sm
  🟩 90                 Pass: 100%/3   | Total: 56m 50s | Avg: 18m 56s | Max: 25m 20s | Hits:  99%/1298  
  🟩 90;90a             Pass: 100%/2   | Total: 38m 01s | Avg: 19m 00s | Max: 29m 58s | Hits:  99%/2487  
  🟩 100;120            Pass: 100%/2   | Total: 39m 22s | Avg: 19m 41s | Max: 31m 00s | Hits:  99%/2487  
🟩 std
  🟩 17                 Pass: 100%/21  | Total:  4h 11m | Avg: 11m 59s | Max: 35m 22s | Hits:  99%/26612 
  🟩 20                 Pass: 100%/29  | Total:  6h 58m | Avg: 14m 26s | Max: 38m 46s | Hits:  99%/26630

🟩 cudax: Pass: 100%/28 | Total: 2h 57m | Avg: 6m 19s | Max: 28m 11s | Hits: 99%/15398

🟩 cpu
  🟩 amd64              Pass: 100%/24  | Total:  2h 45m | Avg:  6m 52s | Max: 28m 11s | Hits:  99%/13026 
  🟩 arm64              Pass: 100%/4   | Total: 11m 56s | Avg:  2m 59s | Max:  3m 20s | Hits:  99%/2372  
🟩 ctk
  🟩 12.0               Pass: 100%/3   | Total: 19m 09s | Avg:  6m 23s | Max: 12m 57s | Hits:  98%/1476  
  🟩 12.9               Pass: 100%/25  | Total:  2h 37m | Avg:  6m 18s | Max: 28m 11s | Hits:  99%/13922 
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/3   | Total: 19m 09s | Avg:  6m 23s | Max: 12m 57s | Hits:  98%/1476  
  🟩 nvcc12.9           Pass: 100%/25  | Total:  2h 37m | Avg:  6m 18s | Max: 28m 11s | Hits:  99%/13922 
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/28  | Total:  2h 57m | Avg:  6m 19s | Max: 28m 11s | Hits:  99%/15398 
🟩 cxx
  🟩 Clang14            Pass: 100%/2   | Total:  6m 14s | Avg:  3m 07s | Max:  3m 23s | Hits: 100%/1188  
  🟩 Clang15            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s | Hits: 100%/593   
  🟩 Clang16            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s | Hits: 100%/593   
  🟩 Clang17            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s | Hits: 100%/593   
  🟩 Clang18            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s | Hits: 100%/593   
  🟩 Clang19            Pass: 100%/4   | Total: 36m 47s | Avg:  9m 11s | Max: 28m 11s | Hits: 100%/2372  
  🟩 GCC10              Pass: 100%/2   | Total:  7m 02s | Avg:  3m 31s | Max:  3m 41s | Hits:  99%/1188  
  🟩 GCC11              Pass: 100%/1   | Total:  3m 41s | Avg:  3m 41s | Max:  3m 41s | Hits:  99%/593   
  🟩 GCC12              Pass: 100%/1   | Total:  3m 56s | Avg:  3m 56s | Max:  3m 56s | Hits:  99%/593   
  🟩 GCC13              Pass: 100%/8   | Total: 38m 54s | Avg:  4m 51s | Max:  9m 49s | Hits:  99%/4744  
  🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 57s | Avg: 12m 57s | Max: 12m 57s | Hits:  95%/290   
  🟩 MSVC14.43          Pass: 100%/3   | Total: 39m 22s | Avg: 13m 07s | Max: 13m 39s | Hits:  95%/876   
  🟩 NVHPC25.5          Pass: 100%/2   | Total: 14m 51s | Avg:  7m 25s | Max:  7m 45s | Hits:  97%/1182  
🟩 cxx_family
  🟩 Clang              Pass: 100%/10  | Total: 56m 18s | Avg:  5m 37s | Max: 28m 11s | Hits: 100%/5932  
  🟩 GCC                Pass: 100%/12  | Total: 53m 33s | Avg:  4m 27s | Max:  9m 49s | Hits:  99%/7118  
  🟩 MSVC               Pass: 100%/4   | Total: 52m 19s | Avg: 13m 04s | Max: 13m 39s | Hits:  95%/1166  
  🟩 NVHPC              Pass: 100%/2   | Total: 14m 51s | Avg:  7m 25s | Max:  7m 45s | Hits:  97%/1182  
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 00s | Hits:  99%/1186  
  🟩 rtx2080            Pass: 100%/26  | Total:  2h 45m | Avg:  6m 22s | Max: 28m 11s | Hits:  99%/14212 
🟩 jobs
  🟩 Build              Pass: 100%/25  | Total:  2h 11m | Avg:  5m 14s | Max: 13m 39s | Hits:  99%/13619 
  🟩 Test               Pass: 100%/3   | Total: 46m 00s | Avg: 15m 20s | Max: 28m 11s | Hits:  99%/1779  
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  8m 00s | Hits:  99%/1186  
  🟩 90;90a             Pass: 100%/2   | Total: 16m 25s | Avg:  8m 12s | Max: 12m 49s | Hits:  98%/885   
  🟩 100;120            Pass: 100%/2   | Total: 16m 32s | Avg:  8m 16s | Max: 12m 54s | Hits:  98%/885   
🟩 std
  🟩 17                 Pass: 100%/3   | Total: 13m 42s | Avg:  4m 34s | Max:  7m 45s | Hits:  99%/1777  
  🟩 20                 Pass: 100%/25  | Total:  2h 43m | Avg:  6m 31s | Max: 28m 11s | Hits:  99%/13621

🟩 cccl_c_parallel: Pass: 100%/4 | Total: 2h 54m | Avg: 43m 33s | Max: 2h 10m | Hits: 98%/680

🟩 cpu
  🟩 amd64              Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 ctk
  🟩 12.9               Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 cudacxx
  🟩 nvcc12.9           Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 cxx
  🟩 GCC13              Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 cxx_family
  🟩 GCC                Pass: 100%/4   | Total:  2h 54m | Avg: 43m 33s | Max:  2h 10m | Hits:  98%/680   
🟩 gpu
  🟩 h100               Pass: 100%/1   | Total: 22m 38s | Avg: 22m 38s | Max: 22m 38s | Hits:  98%/170   
  🟩 l4                 Pass: 100%/1   | Total: 19m 00s | Avg: 19m 00s | Max: 19m 00s | Hits:  98%/170   
  🟩 rtx2080            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  2h 10m | Hits:  98%/340   
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s | Hits:  98%/170   
  🟩 Test               Pass: 100%/3   | Total:  2h 52m | Avg: 57m 23s | Max:  2h 10m | Hits:  98%/510

🟩 packaging: Pass: 100%/4 | Total: 24m 54s | Avg: 6m 13s | Max: 8m 29s

🟩 cpu
  🟩 amd64              Pass: 100%/4   | Total: 24m 54s | Avg:  6m 13s | Max:  8m 29s
🟩 ctk
  🟩 12.0               Pass: 100%/2   | Total: 13m 59s | Avg:  6m 59s | Max:  8m 17s
  🟩 12.9               Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 29s
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/2   | Total: 13m 59s | Avg:  6m 59s | Max:  8m 17s
  🟩 nvcc12.9           Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 29s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/4   | Total: 24m 54s | Avg:  6m 13s | Max:  8m 29s
🟩 cxx
  🟩 Clang14            Pass: 100%/1   | Total:  8m 17s | Avg:  8m 17s | Max:  8m 17s
  🟩 Clang19            Pass: 100%/1   | Total:  8m 29s | Avg:  8m 29s | Max:  8m 29s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s
  🟩 GCC13              Pass: 100%/1   | Total:  2m 26s | Avg:  2m 26s | Max:  2m 26s
🟩 cxx_family
  🟩 Clang              Pass: 100%/2   | Total: 16m 46s | Avg:  8m 23s | Max:  8m 29s
  🟩 GCC                Pass: 100%/2   | Total:  8m 08s | Avg:  4m 04s | Max:  5m 42s
🟩 gpu
  🟩 rtx2080            Pass: 100%/4   | Total: 24m 54s | Avg:  6m 13s | Max:  8m 29s
🟩 jobs
  🟩 Test               Pass: 100%/4   | Total: 24m 54s | Avg:  6m 13s | Max:  8m 29s

🟩 stdpar: Pass: 100%/4 | Total: 16m 00s | Avg: 4m 00s | Max: 4m 20s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  8m 25s | Avg:  4m 12s | Max:  4m 20s
  🟩 arm64              Pass: 100%/2   | Total:  7m 35s | Avg:  3m 47s | Max:  3m 48s
🟩 ctk
  🟩 12.9               Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 cudacxx
  🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 cxx
  🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 cxx_family
  🟩 NVHPC              Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 gpu
  🟩 rtx2080            Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 jobs
  🟩 Build              Pass: 100%/4   | Total: 16m 00s | Avg:  4m 00s | Max:  4m 20s
🟩 std
  🟩 17                 Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 05s
  🟩 20                 Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 20s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	CCCL Packaging
	libcu++
	CUB
+/-	Thrust
	CUDA Experimental
	stdpar
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	CCCL Packaging
	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	stdpar
	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 140)

#	Runner
91	`linux-amd64-cpu16`
17	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
7	`linux-amd64-gpu-rtx2080-latest-1`
6	`linux-amd64-gpu-rtxa6000-latest-1`
5	`linux-amd64-gpu-h100-latest-1`
3	`linux-amd64-gpu-rtx4090-latest-1`
1	`linux-amd64-gpu-l4-latest-1`

charan-003 · 2025-08-27T17:58:47Z

Thanks a lot for improving the example. This is already much better, I have some concerns about using std:: algorithms, because they only run on host and will segfault with device memory

We need to use the equivalent thrust algorithms

thank you so much for the review.

sure, let me work on that

… where necessary

miscco · 2025-08-29T06:42:16Z

thrust/examples/bounding_box.cu

+  mutable unsigned int seed;
+
+  random_point_generator()
+      : seed(0)
+  {}
+
+  __host__ __device__ point2d operator()() const
+  {
+    thrust::default_random_engine rng(seed++);
+    thrust::uniform_real_distribution<float> u01(0.0f, 1.0f);
+    return point2d(u01(rng), u01(rng));
+  }


You really do not want to do this.

Initializing a random number generator is expensive!

You should hold the RNG as the member and initialize it on construction then in the call operator only call it

sorry for that.
i get it now, let me change that

miscco · 2025-08-29T07:29:15Z

thrust/examples/bounding_box.cu

+  bbox init = bbox(points[0], points[0]);

  // compute the bounding box for the point set
-  bbox result = thrust::reduce(points.begin(), points.end(), first_point, bbox_union{});
+  bbox result = thrust::reduce(points.begin(), points.end(), init, bbox_union{});


Why are those changes necessary?

The variable rename isn't functionally necessary, just following naming conventions from the modernization patterns.

happy to keep first_point if preferred.

Oh I meant more why its not bbox init{points[0], points[0]};

You're right. The direct constructor syntax bbox init(points[0], points[0]); is cleaner
Let me quickly update that

miscco

Thanks a lot, this is looking so much better that the original examples 🎉

One more final nitpick

miscco · 2025-08-29T14:55:24Z

thrust/examples/sum_rows.cu

 #include <thrust/random.h>
 #include <thrust/reduce.h>

+#include <algorithm>


I believe those are not longer needed

true, let me remove them

charan-003 · 2025-09-05T03:04:31Z

Thanks for updating those files! I was looking into it but got busy with university, and when I tried to recreate some of those changes locally I wasn't able to get them working properly. I noticed you handled some of the backend abstraction issues and compilation context stuff that I hadn't considered in my modernization work. The discussion about when certain patterns are appropriate was really insightful. Is there a good way for me to learn more about these CUDA-specific design decisions so I can tackle similar issues better next time

miscco · 2025-09-05T07:10:36Z

The discussion about when certain patterns are appropriate was really insightful. Is there a good way for me to learn more about these CUDA-specific design decisions so I can tackle similar issues better next time

I dont think there is a definitive place to look.

Most of it comes down to which backend thrust uses. If it uses the CUDA backend, then we need to consider device memory and also annotate the respective functions appropriately for the CUDA compiler to generate device code.

thrust/examples/bounding_box.cu

thrust/examples/discrete_voronoi.cu

thrust/examples/dot_products_with_zip.cu

thrust/examples/expand.cu

thrust/examples/histogram.cu

thrust/examples/padded_grid_reduction.cu

thrust/examples/word_count.cu

charan-003 · 2025-09-12T04:30:09Z

Custom maximum functor for compatibility between different compilation environments:

nvcc command line compilation doesn't have cuda::maximum available
CMake build treats thrust::maximum as deprecated and requires cuda::maximum

This custom functor works with both compilation methods

srinivasyadav18 · 2025-09-12T20:47:33Z

/ok to test 6e58d56

thrust/examples/max_abs_diff.cu

thrust/examples/expand.cu

bernhardmgruber

Almost good to merge. A few more comments:

bernhardmgruber · 2025-09-15T21:23:38Z

thrust/examples/minmax.cu

+  thrust::host_vector<int> host_data(N);
+  for (size_t i = 0; i < host_data.size(); i++)
  {
-    data[i] = dist(rng);
+    host_data[i] = dist(rng);
  }


This should use a range for:

for (auto& e : host_data) e = dist(rng);

bernhardmgruber · 2025-09-15T21:24:33Z

thrust/examples/minmax.cu

+  for (size_t i = 0; i < data.size(); i++)
  {
    std::cout << data[i] << " ";


Suggestion: Could also use a range for. We could do this as a follow up PR though. Probably applies to a lot more places.

thrust/examples/scan_matrix_by_rows.cu

bernhardmgruber · 2025-09-15T21:27:08Z

thrust/examples/sort.cu

  thrust::default_random_engine rng(123456);
  thrust::uniform_int_distribution<int> dist(0, 9);
-  for (size_t i = 0; i < v.size(); i++)
+  thrust::host_vector<thrust::pair<int, int>> host_data(v.size());


Suggestion: as a follow-up PR, we should replace thrust::pair by cuda::std::pair.

bernhardmgruber · 2025-09-15T21:27:56Z

thrust/examples/sum_rows.cu


  // print data
-  for (int i = 0; i < R; i++)
+  for (size_t i = 0; i < static_cast<size_t>(R); i++)


Again, why is the size_t beneficial here? Using int i would simplify this. Also on the next loop below.

bernhardmgruber · 2025-09-15T21:29:39Z

thrust/examples/sorting_aos_vs_soa.cu

+    MyStruct s;
+    s.key           = dist(rng);
+    h_structures[i] = s;


Question: The code before did the same. Why is the new version an improvement?

charan-003 · 2025-09-15T21:31:35Z

Almost good to merge. A few more comments:

Sure , let me work on them.

bernhardmgruber

Last comments, then we are good I think.

thrust/examples/discrete_voronoi.cu

thrust/examples/bounding_box.cu

bernhardmgruber · 2025-09-15T23:15:49Z

thrust/examples/CMakeLists.txt

  endif()

+  # We do not want to explicitly include `host_device.h` if not needed, so force include the file for non CUDA targets
+  target_compile_options(${example_target} PRIVATE


@miscco device_vector allocates memory on the current Thrust device system. If that is CUDA, it's CUDA device memory. If the device system is TBB, OMP or CPP, then a device_vector just behaves like a host vector. This is so Thrust can switch backends with the preprocessor.

bernhardmgruber · 2025-09-16T07:19:59Z

/ok to test d171ad8

github-actions · 2025-09-16T09:51:38Z

🥳 CI Workflow Results

🟩 Finished in 2h 28m: Pass: 100%/159 | Total: 1d 19h | Max: 2h 19m | Hits: 97%/186166

See results here.

bernhardmgruber · 2025-09-16T10:13:39Z

Thanks a lot for the contribution. Great work!

charan-003 · 2025-09-16T10:23:40Z

Thanks a lot for the contribution. Great work!

Thanks a lot for support and guidance:))

xtofl

Thanks for taking this up! It's great to have exemplary code available in examples.

The term 'modern' evolves, of course, we have lots of C++ goodies we didn't have at the time of creating this issue. These goodies can make the code more succinct and telling: generic lambdas, std::format, ...

I barely started looking at this PR. Though I won't have time to go into deep detail, I want to add some suggestions showing what my take on 'modern' is - cf. further on.

xtofl · 2025-09-18T06:53:47Z

thrust/examples/arbitrary_transformation.cu

  // print the output
  std::cout << "Tuple functor" << std::endl;
-  for (int i = 0; i < 5; i++)
+  for (size_t i = 0; i < A.size(); i++)


Isn't the preferred form for (size_t i = 0; i != size(A); ++i) ?

Also,

is it possible to use iterators? (My C++ has been rusting for 5 years now)

let's not use std::endl unless needed (cf here)

can we use std::format to our advantage?

free functions improve encapsulation (cf. here)

for ( auto it = make_zip_iterator(make_tuple(begin(A), begin(B), begin(C), begin(D)))); it != make_zip_iterator(make_tuple(end(A), end(B), end(C), end(D)))); ++it) { std::cout << std::format("{} + {} * {} = {}\n", *it); }

Maybe the make_zip_iterator(make_tuple(begin(A), ...))) can be extracted into a generic somehow, along the lines of

auto zip_begin(auto containers..) { return make_zip_iterator(make_tuple(begin(containers)...)); } auto zip_end(auto containers..) { return make_zip_iterator(make_tuple(end(containers)...)); }

In which case the above simplifies further to

for ( auto it = zip_begin(A, B, C, D); it != zip_end(A, B, C, D); ++it) { std::cout << std::format("{} + {} * {} = {}\n", *it); }

Thank you for the feedback! You are always free to create a PR yourself or start a discussion.

Isn't the preferred form for (size_t i = 0; i != size(A); ++i) ?

I have no preference here. The PR improved the situation by not using a magic number, which is good.

is it possible to use iterators? (My C++ has been rusting for 5 years now)

Yes, but iterating 4 ranges at the same time using a zip may also be a bit over-engineered. Using an index if fine here IMO. Examples should be easy.

let's not use std::endl unless needed (cf here)

Correct. Feel free to propose a PR to replace them by '\n'.

can we use std::format to our advantage?

CCCL still supports C++17, but I don't see a blocker with using C++20 in examples only. I will start a discussion internally.

free functions improve encapsulation (cf. here)

Again, for example code I have no preference here. I agree with this when writing library code.

for ( auto it = make_zip_iterator(make_tuple(begin(A), begin(B), begin(C), begin(D)))); it != make_zip_iterator(make_tuple(end(A), end(B), end(C), end(D)))); ++it) { std::cout << std::format("{} + {} * {} = {}\n", *it); }

I think this does not increase readability or clarity of the example.

Maybe the make_zip_iterator(make_tuple(begin(A), ...))) can be extracted into a generic somehow, along the lines of

We have that today, just construct the zip_iterator and led CTAD deduce the arguments:

zip_iterator(begin(A), begin(B), begin(C);

Should deduce zip_iterator<decltype(begin(A)), ...>. That only works with cuda::zip_iterator. For thrust, you can at least skip the make_tuple, we fixed that some time ago.

* Modernize Thrust examples following PR NVIDIA#753 patterns - Remove legacy include/host_device.h headers from 40 example files - Replace manual element assignment with std::initializer_list - Use range-based for loops where appropriate - Apply STL algorithms (std::generate) with lambdas - Use .size() instead of hardcoded array sizes - Improve semantic naming and inline usage - Maintain compatibility with current CUDA/Thrust version - Avoid thrust::enumerate (not available in current version) - Modernized to thrust::generate, cuda::std::distance, thrust::sequence where necessary Files modernized: arbitrary_transformation.cu, basic_vector.cu, bounding_box.cu, bucket_sort2d.cu, constant_iterator.cu, counting_iterator.cu, device_ptr.cu, discrete_voronoi.cu, dot_products_with_zip.cu, expand.cu, histogram.cu, lambda.cu, lexicographical_sort.cu, max_abs_diff.cu, minmax.cu, mode.cu, monte_carlo.cu, monte_carlo_disjoint_sequences.cu, norm.cu, padded_grid_reduction.cu, permutation_iterator.cu, raw_reference_cast.cu, remove_points2d.cu, repeated_range.cu, saxpy.cu, scan_matrix_by_rows.cu, simple_moving_average.cu, sort.cu, sorting_aos_vs_soa.cu, stream_compaction.cu, sum_rows.cu, summary_statistics.cu, summed_area_table.cu, tiled_range.cu, transform_input_output_iterator.cu, transform_iterator.cu, transform_output_iterator.cu, uninitialized_vector.cu, weld_vertices.cu, word_count.cu Co-authored-by: Sai Charan <scharan@rostam1.rostam.cct.lsu.edu> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

charan-003 requested a review from a team as a code owner August 27, 2025 03:00

github-project-automation bot added this to CCCL Aug 27, 2025

charan-003 requested a review from wmaxey August 27, 2025 03:00

github-project-automation bot moved this to Todo in CCCL Aug 27, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Aug 27, 2025

removed debug comments

e614728

charan-003 mentioned this pull request Aug 27, 2025

Examples can (and should) be exemplary C++ #724

Open

charan-003 marked this pull request as draft August 27, 2025 03:05

cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Aug 27, 2025

Sai Charan and others added 2 commits August 26, 2025 22:29

clang format

b6efb9b

Merge branch 'main' into modernize-thrust-examples

3b62cc1

charan-003 marked this pull request as ready for review August 27, 2025 03:32

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Aug 27, 2025

miscco requested changes Aug 27, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Aug 27, 2025

Modernized to thrust::generate, cuda::std::distance, thrust::sequence…

4731c69

… where necessary

charan-003 requested a review from miscco August 29, 2025 03:46

Merge branch 'main' into modernize-thrust-examples

989f690

miscco reviewed Aug 29, 2025

View reviewed changes

moved to member variables

60238f3

charan-003 requested a review from miscco August 29, 2025 07:27

miscco reviewed Aug 29, 2025

View reviewed changes

direct constructor syntax

0c28975

charan-003 requested a review from miscco August 29, 2025 12:29

miscco reviewed Aug 29, 2025

View reviewed changes

bernhardmgruber reviewed Sep 8, 2025

View reviewed changes

Sai Charan and others added 3 commits September 11, 2025 23:01

fix race conditions

5e32929

fix clang format

0841ade

Merge branch 'main' into modernize-thrust-examples

6e58d56

charan-003 requested review from bernhardmgruber and miscco September 12, 2025 04:18

This comment has been minimized.

Sign in to view

miscco reviewed Sep 15, 2025

View reviewed changes

thrust/examples/max_abs_diff.cu Outdated Show resolved Hide resolved

use cuda::maximum

02b49b4

bernhardmgruber reviewed Sep 15, 2025

View reviewed changes

thrust/examples/expand.cu Outdated Show resolved Hide resolved

Update thrust/examples/expand.cu

30a6d4f

bernhardmgruber reviewed Sep 15, 2025

View reviewed changes

minor fixes and using ranges

3557c0d

bernhardmgruber reviewed Sep 15, 2025

View reviewed changes

reverting back

d171ad8

bernhardmgruber approved these changes Sep 16, 2025

View reviewed changes

bernhardmgruber merged commit da89394 into NVIDIA:main Sep 16, 2025
170 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Sep 16, 2025

charan-003 deleted the modernize-thrust-examples branch September 16, 2025 13:03

xtofl reviewed Sep 18, 2025

View reviewed changes

Modernize Thrust examples #5670

Modernize Thrust examples #5670

Uh oh!

Conversation

charan-003 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Aug 27, 2025

Uh oh!

miscco commented Aug 27, 2025

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 27, 2025

🟥 thrust: Pass: 0%/50 | Total: 7h 53m | Avg: 9m 27s | Max: 33m 16s

🟩 cub: Pass: 100%/50 | Total: 11h 10m | Avg: 13m 24s | Max: 38m 46s | Hits: 99%/53242

🟩 cudax: Pass: 100%/28 | Total: 2h 57m | Avg: 6m 19s | Max: 28m 11s | Hits: 99%/15398

🟩 cccl_c_parallel: Pass: 100%/4 | Total: 2h 54m | Avg: 43m 33s | Max: 2h 10m | Hits: 98%/680

🟩 packaging: Pass: 100%/4 | Total: 24m 54s | Avg: 6m 13s | Max: 8m 29s

🟩 stdpar: Pass: 100%/4 | Total: 16m 00s | Avg: 4m 00s | Max: 4m 20s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 140)

Uh oh!

charan-003 commented Aug 27, 2025

Uh oh!

miscco Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charan-003 Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charan-003 commented Sep 5, 2025

Uh oh!

miscco commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charan-003 commented Sep 12, 2025

Uh oh!

srinivasyadav18 commented Sep 12, 2025

Uh oh!

This comment has been minimized.

Uh oh!

charan-003 commented Aug 27, 2025 •

edited

Loading

miscco Aug 29, 2025 •

edited

Loading

charan-003 Aug 29, 2025 •

edited

Loading