Skip to content

Conversation

@ErikQQY
Copy link
Contributor

@ErikQQY ErikQQY commented May 22, 2025

Fix: #2783

Provide the more general way to define a sparse diagonal matrix with CUDA.

@github-actions
Copy link
Contributor

github-actions bot commented May 22, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/lib/cusparse/array.jl b/lib/cusparse/array.jl
index 4c11e0b5d..3f4df8f8d 100644
--- a/lib/cusparse/array.jl
+++ b/lib/cusparse/array.jl
@@ -336,20 +336,20 @@ function SparseArrays.sparsevec(I::CuArray{Ti}, V::CuArray{Tv}, n::Integer) wher
     CuSparseVector(I, V, n)
 end
 
-SparseArrays.spdiagm(kv::Pair{<:Integer,<:CuVector}...) = _cuda_spdiagm(nothing, kv...)
-SparseArrays.spdiagm(m::Integer, n::Integer, kv::Pair{<:Integer,<:CuVector}...) = _cuda_spdiagm((Int(m),Int(n)), kv...)
+SparseArrays.spdiagm(kv::Pair{<:Integer, <:CuVector}...) = _cuda_spdiagm(nothing, kv...)
+SparseArrays.spdiagm(m::Integer, n::Integer, kv::Pair{<:Integer, <:CuVector}...) = _cuda_spdiagm((Int(m), Int(n)), kv...)
 SparseArrays.spdiagm(v::CuVector) = _cuda_spdiagm(nothing, 0 => v)
 SparseArrays.spdiagm(m::Integer, n::Integer, v::CuVector) = _cuda_spdiagm((Int(m), Int(n)), 0 => v)
 
 function _cuda_spdiagm(size, kv::Pair{<:Integer, <:CuVector}...)
     I, J, V, mmax, nmax = _cuda_spdiagm_internal(kv...)
     mnmax = max(mmax, nmax)
-    m, n = something(size, (mnmax,mnmax))
+    m, n = something(size, (mnmax, mnmax))
     (m ≥ mmax && n ≥ nmax) || throw(DimensionMismatch("invalid size=$size"))
     return sparse(CuVector(I), CuVector(J), V, m, n)
 end
 
-function _cuda_spdiagm_internal(kv::Pair{T,<:CuVector}...) where {T<:Integer}
+function _cuda_spdiagm_internal(kv::Pair{T, <:CuVector}...) where {T <: Integer}
     ncoeffs = 0
     for p in kv
         ncoeffs += SparseArrays._nnz(p.second)
@@ -374,7 +374,7 @@ function _cuda_spdiagm_internal(kv::Pair{T,<:CuVector}...) where {T<:Integer}
             col = 0
         end
         numel = SparseArrays._nnz(v)
-        r = 1+i:numel+i
+        r = (1 + i):(numel + i)
         I_r, J_r = SparseArrays._indices(v, row, col)
         copyto!(view(I, r), I_r)
         copyto!(view(J, r), J_r)

@maleadt maleadt requested a review from amontoison May 22, 2025 15:34
@maleadt maleadt added enhancement New feature or request cuda array Stuff about CuArray. labels May 22, 2025
@codecov
Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.71%. Comparing base (a4a7af4) to head (f3322ad).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
lib/cusparse/array.jl 90.90% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2784      +/-   ##
==========================================
- Coverage   89.72%   89.71%   -0.01%     
==========================================
  Files         153      153              
  Lines       13232    13274      +42     
==========================================
+ Hits        11872    11909      +37     
- Misses       1360     1365       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 2523b11 Previous: a4a7af4 Ratio
latency/precompile 43078512320.5 ns 43227936166.5 ns 1.00
latency/ttfp 7235008141 ns 7121275224 ns 1.02
latency/import 3450098826 ns 3420695643 ns 1.01
integration/volumerhs 9609573 ns 9610749.5 ns 1.00
integration/byval/slices=1 147170 ns 146856 ns 1.00
integration/byval/slices=3 425740 ns 425444 ns 1.00
integration/byval/reference 145312 ns 145104 ns 1.00
integration/byval/slices=2 286277.5 ns 286260 ns 1.00
integration/cudadevrt 103616 ns 103521 ns 1.00
kernel/indexing 14397.5 ns 14402 ns 1.00
kernel/indexing_checked 15277 ns 14989.5 ns 1.02
kernel/occupancy 755.1029411764706 ns 722.1785714285714 ns 1.05
kernel/launch 2430.6666666666665 ns 2276.3333333333335 ns 1.07
kernel/rand 17888 ns 17130 ns 1.04
array/reverse/1d 20188 ns 20044.5 ns 1.01
array/reverse/2d 25118.5 ns 24062 ns 1.04
array/reverse/1d_inplace 11233 ns 10376 ns 1.08
array/reverse/2d_inplace 13428 ns 11904 ns 1.13
array/copy 21288 ns 21530 ns 0.99
array/iteration/findall/int 160244 ns 159043 ns 1.01
array/iteration/findall/bool 140284 ns 139211 ns 1.01
array/iteration/findfirst/int 164195 ns 160978 ns 1.02
array/iteration/findfirst/bool 165257.5 ns 162266 ns 1.02
array/iteration/scalar 73406 ns 71832 ns 1.02
array/iteration/logical 219938 ns 217950.5 ns 1.01
array/iteration/findmin/1d 46937.5 ns 47357.5 ns 0.99
array/iteration/findmin/2d 97423 ns 97468 ns 1.00
array/reductions/reduce/1d 36616 ns 36864 ns 0.99
array/reductions/reduce/2d 42192.5 ns 41394 ns 1.02
array/reductions/mapreduce/1d 34847 ns 34541 ns 1.01
array/reductions/mapreduce/2d 41664.5 ns 41294 ns 1.01
array/broadcast 21469 ns 21139 ns 1.02
array/copyto!/gpu_to_gpu 12489 ns 12850.5 ns 0.97
array/copyto!/cpu_to_gpu 217160 ns 217756 ns 1.00
array/copyto!/gpu_to_cpu 284338 ns 285934 ns 0.99
array/accumulate/1d 110000 ns 108976.5 ns 1.01
array/accumulate/2d 81141 ns 80598 ns 1.01
array/construct 1262.5 ns 1307.6 ns 0.97
array/random/randn/Float32 44785.5 ns 47928 ns 0.93
array/random/randn!/Float32 25168 ns 25275 ns 1.00
array/random/rand!/Int64 27407 ns 27404 ns 1.00
array/random/rand!/Float32 8859 ns 8877.666666666666 ns 1.00
array/random/rand/Int64 30532 ns 30410 ns 1.00
array/random/rand/Float32 13334 ns 13403 ns 0.99
array/permutedims/4d 61801 ns 61384.5 ns 1.01
array/permutedims/2d 56099.5 ns 55567 ns 1.01
array/permutedims/3d 56622 ns 56298 ns 1.01
array/sorting/1d 2776621.5 ns 2778057 ns 1.00
array/sorting/by 3367591.5 ns 3369635 ns 1.00
array/sorting/2d 1087061.5 ns 1086371 ns 1.00
cuda/synchronization/stream/auto 1062.1 ns 1016.0714285714286 ns 1.05
cuda/synchronization/stream/nonblocking 7424.4 ns 7897.7 ns 0.94
cuda/synchronization/stream/blocking 851.4489795918367 ns 805.7272727272727 ns 1.06
cuda/synchronization/context/auto 1152.2 ns 1176.3 ns 0.98
cuda/synchronization/context/nonblocking 7721.1 ns 7510.1 ns 1.03
cuda/synchronization/context/blocking 914.1041666666666 ns 915.85 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Member

@amontoison amontoison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the file .DS_Store that should be removed, everything looks correct!
Thanks for the PR.

@ErikQQY
Copy link
Contributor Author

ErikQQY commented May 24, 2025

This PR seems good to be merged.

@amontoison amontoison merged commit 22a875b into JuliaGPU:master May 24, 2025
3 checks passed
@ErikQQY ErikQQY deleted the qqy/spdiagm branch May 24, 2025 02:43
@ErikQQY
Copy link
Contributor Author

ErikQQY commented May 24, 2025

@amontoison Thanks a lot! Can we cut a release with this patch? I need it in my package testing.

@amontoison
Copy link
Member

amontoison commented May 24, 2025

I need a green light from @maleadt if you want a new release.
You probably also want #2786 for the new release?

@ErikQQY
Copy link
Contributor Author

ErikQQY commented May 24, 2025

Yeah, thanks, a new release with #2786 would be great

christiangnrd pushed a commit to christiangnrd/CUDA.jl that referenced this pull request Jun 27, 2025
* Fix spdiagm with specified pairs

* Should subtype keys

* Remove DS_store
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda array Stuff about CuArray. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spdiagm doesn't support specified diagonal elements

3 participants