-
Notifications
You must be signed in to change notification settings - Fork 263
SparseMatricesCSR Dispatch #2720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2720 +/- ##
===========================================
+ Coverage 77.34% 89.62% +12.28%
===========================================
Files 153 153
Lines 13108 13195 +87
===========================================
+ Hits 10138 11826 +1688
+ Misses 2970 1369 -1601 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 0e80594 | Previous: 7a83380 | Ratio |
|---|---|---|---|
latency/precompile |
42831148863 ns |
42926176632.5 ns |
1.00 |
latency/ttfp |
7055807029 ns |
7106699468 ns |
0.99 |
latency/import |
3375773699 ns |
3389948934 ns |
1.00 |
integration/volumerhs |
9597397 ns |
9608052 ns |
1.00 |
integration/byval/slices=1 |
147336 ns |
147162 ns |
1.00 |
integration/byval/slices=3 |
426120 ns |
425761 ns |
1.00 |
integration/byval/reference |
145426 ns |
145313 ns |
1.00 |
integration/byval/slices=2 |
286891 ns |
286425 ns |
1.00 |
integration/cudadevrt |
103861 ns |
103604 ns |
1.00 |
kernel/indexing |
14551 ns |
14311 ns |
1.02 |
kernel/indexing_checked |
15115.5 ns |
15150.5 ns |
1.00 |
kernel/occupancy |
719.3142857142857 ns |
707.8943661971831 ns |
1.02 |
kernel/launch |
2538 ns |
2395.4444444444443 ns |
1.06 |
kernel/rand |
18953 ns |
18678 ns |
1.01 |
array/reverse/1d |
20039 ns |
19617 ns |
1.02 |
array/reverse/2d |
24584 ns |
24183.5 ns |
1.02 |
array/reverse/1d_inplace |
11270 ns |
10894 ns |
1.03 |
array/reverse/2d_inplace |
12872 ns |
12860 ns |
1.00 |
array/copy |
21059 ns |
20975 ns |
1.00 |
array/iteration/findall/int |
159756.5 ns |
157315 ns |
1.02 |
array/iteration/findall/bool |
140032 ns |
138316 ns |
1.01 |
array/iteration/findfirst/int |
154689 ns |
154273.5 ns |
1.00 |
array/iteration/findfirst/bool |
156124 ns |
154976.5 ns |
1.01 |
array/iteration/scalar |
72823 ns |
73010 ns |
1.00 |
array/iteration/logical |
220835.5 ns |
215446 ns |
1.03 |
array/iteration/findmin/1d |
42153 ns |
41421 ns |
1.02 |
array/iteration/findmin/2d |
94512 ns |
94396 ns |
1.00 |
array/reductions/reduce/1d |
45220 ns |
43605.5 ns |
1.04 |
array/reductions/reduce/2d |
51768.5 ns |
45135 ns |
1.15 |
array/reductions/mapreduce/1d |
40197.5 ns |
40394 ns |
1.00 |
array/reductions/mapreduce/2d |
52388.5 ns |
51842 ns |
1.01 |
array/broadcast |
21246 ns |
21077 ns |
1.01 |
array/copyto!/gpu_to_gpu |
12645 ns |
11034 ns |
1.15 |
array/copyto!/cpu_to_gpu |
218110 ns |
216527 ns |
1.01 |
array/copyto!/gpu_to_cpu |
284544 ns |
283217 ns |
1.00 |
array/accumulate/1d |
110400 ns |
108949 ns |
1.01 |
array/accumulate/2d |
81360 ns |
80328 ns |
1.01 |
array/construct |
1237.2 ns |
1243.7 ns |
0.99 |
array/random/randn/Float32 |
48013 ns |
47595 ns |
1.01 |
array/random/randn!/Float32 |
25166 ns |
25281 ns |
1.00 |
array/random/rand!/Int64 |
27536 ns |
27337 ns |
1.01 |
array/random/rand!/Float32 |
8832 ns |
8944.333333333334 ns |
0.99 |
array/random/rand/Int64 |
34423 ns |
33758 ns |
1.02 |
array/random/rand/Float32 |
13124 ns |
13235 ns |
0.99 |
array/permutedims/4d |
61784.5 ns |
61033.5 ns |
1.01 |
array/permutedims/2d |
55629 ns |
54927 ns |
1.01 |
array/permutedims/3d |
56340 ns |
55899 ns |
1.01 |
array/sorting/1d |
2778494 ns |
2777029.5 ns |
1.00 |
array/sorting/by |
3369759 ns |
3367967 ns |
1.00 |
array/sorting/2d |
1087242 ns |
1085746 ns |
1.00 |
cuda/synchronization/stream/auto |
1005 ns |
1030 ns |
0.98 |
cuda/synchronization/stream/nonblocking |
7507.5 ns |
8027.8 ns |
0.94 |
cuda/synchronization/stream/blocking |
803.6382978723404 ns |
803.8265306122449 ns |
1.00 |
cuda/synchronization/context/auto |
1161.3 ns |
1173.1 ns |
0.99 |
cuda/synchronization/context/nonblocking |
8219.7 ns |
7706.2 ns |
1.07 |
cuda/synchronization/context/blocking |
899.6078431372549 ns |
906.5853658536586 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
|
This PR should be ready by now |
|
Extension packages need to be listed explicitly in the CI pipeline: CUDA.jl/.buildkite/pipeline.yml Lines 144 to 251 in a3d6aa4
Alternatively, since this is a simple interface package, we could add it to the |
|
Tests in Lines 115 to 117 in c113666
You'll have to move it to e.g. |
This PR adds a new extension module SparseMatricesCSRExt that enables dispatching
SparseMatrixCSRfromSparseMatricesCSR.jltoCuSparseMatrixCSR.