Cuda kernels for the upper triangular solver #342

pratikvn · 2019-08-31T23:27:03Z

This PR adds the cusparse cuda kernels for the Upper Triangular solver.

TODO

Merge Cuda kernels for Lower triangular solve #336
Merge Upper triangular solver #341.

codecov · 2019-09-01T10:44:06Z

Codecov Report

Merging #342 into develop will increase coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop     #342      +/-   ##
===========================================
+ Coverage    98.25%   98.26%   +<.01%     
===========================================
  Files          246      247       +1     
  Lines        18414    18466      +52     
===========================================
+ Hits         18093    18145      +52     
  Misses         321      321

Impacted Files	Coverage Δ
cuda/base/device_guard.hpp	`100% <ø> (ø)`	⬆️
cuda/test/solver/lower_trs_kernels.cpp	`100% <ø> (ø)`	⬆️
cuda/test/solver/upper_trs_kernels.cpp	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b9b919...3517023. Read the comment docs.

+ Move init struct to UpperTrs class.

tcojean

I would like to see some changes to this PR.

cuda/solver/upper_trs_kernels.cu

+ Adds automatic setting and resetting of the {CULIBS}_POINTER_MODE from HOST to DEVICE + Adds the pointer_mode_guards to dense kernels and cuda_linops in benchmarks as well.

yhmtsai

If the only difference between upper_triangular and lower_triangular is the FillMode, you can move the same part to a new cuh header file to avoid the duplicated lines from Sonar.

cuda/matrix/csr_kernels.cu

cuda/solver/lower_trs_kernels.cu

yhmtsai

LGTM

cuda/solver/common_trs_kernels.cuh

tcojean

LGTM. Some not important comment.

cuda/base/pointer_mode_guard.hpp

+ Remove code duplication in cuda kernels by moving common code to a .cuh file. + Update the artifacts uploading in the YML file to circumvent the GITLAB limits.

thoasm

Looks good, but I am missing some comments and documentation.

cuda/base/device_guard.hpp

cuda/base/pointer_mode_guard.hpp

cuda/solver/common_trs_kernels.cuh

cuda/test/solver/upper_trs_kernels.cpp

thoasm

LGTM!

cuda/base/pointer_mode_guard.hpp

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + Windows + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or CygWin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) + New factorization support in Ginkgo, and addition of the ParILU algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324)) + New ILU preconditioner ([#348](#348), [#353](#353)) + Windows MinGW and Cygwin support ([#347](#347)) + Windows Visual studio support ([#351](#351)) + New example showing how to use ParILU as a preconditioner ([#358](#358)) + New example on using loggers for debugging ([#360](#360)) + Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306)) + Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303)) + New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317)) + Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310)) + Support for sorting rows in the CSR format by column idices ([#322](#322)) + Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345)) + Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) ([#334](#334)) + New sparsity matrix representation format with Reference and OpenMP kernels ([#349](#349), [#350](#350)) Fixes: + Accelerate GMRES solver for CUDA executor ([#363](#363)) + Fix BiCGSTAB solver convergence ([#359](#359)) + Fix CGS logging by reporting the residual for every sub iteration ([#328](#328)) + Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295)) + Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318)) + Fixed slowdown of COO SpMV on OpenMP ([#340](#340)) + Fix gcc 6.4.0 internal compiler error ([#316](#316)) + Fix compilation issue on Apple clang++ 10 ([#322](#322)) + Make Ginkgo able to compile on Intel 2017 and above ([#337](#337)) + Make the benchmarks spmv/solver use the same matrix formats ([#366](#366)) + Fix self-written isfinite function ([#348](#348)) + Fix Jacobi issues shown by cuda-memcheck Tools and ecosystem: + Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365)) + Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361)) + Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309)) + Add clang-tidy and iwyu support to Ginkgo ([#298](#298)) + Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments to CMake ([#300](#300)) + Add support for the xSDK R7 policy ([#325](#325)) + Fix examples in html documentation ([#367](#367))

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). ### Additions + Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) + New factorization support in Ginkgo, and addition of the ParILU algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324)) + New ILU preconditioner ([#348](#348), [#353](#353)) + Windows MinGW and Cygwin support ([#347](#347)) + Windows Visual Studio support ([#351](#351)) + New example showing how to use ParILU as a preconditioner ([#358](#358)) + New example on using loggers for debugging ([#360](#360)) + Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306)) + Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303)) + New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317)) + Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310)) + Support for sorting rows in the CSR format by column idices ([#322](#322)) + Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345)) + Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) ([#334](#334)) + New sparsity matrix representation format with Reference and OpenMP kernels ([#349](#349), [#350](#350)) ### Fixes + Accelerate GMRES solver for CUDA executor ([#363](#363)) + Fix BiCGSTAB solver convergence ([#359](#359)) + Fix CGS logging by reporting the residual for every sub iteration ([#328](#328)) + Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295)) + Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318)) + Fixed slowdown of COO SpMV on OpenMP ([#340](#340)) + Fix gcc 6.4.0 internal compiler error ([#316](#316)) + Fix compilation issue on Apple clang++ 10 ([#322](#322)) + Make Ginkgo able to compile on Intel 2017 and above ([#337](#337)) + Make the benchmarks spmv/solver use the same matrix formats ([#366](#366)) + Fix self-written isfinite function ([#348](#348)) + Fix Jacobi issues shown by cuda-memcheck ### Tools and ecosystem improvements + Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365)) + Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361)) + Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309)) + Add clang-tidy and iwyu support to Ginkgo ([#298](#298)) + Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments to CMake ([#300](#300)) + Add support for the xSDK R7 policy ([#325](#325)) + Fix examples in html documentation ([#367](#367)) Related PR: #370

pratikvn requested review from thoasm, yhmtsai, hartwiganzt and tcojean August 31, 2019 23:27

pratikvn self-assigned this Aug 31, 2019

pratikvn mentioned this pull request Aug 31, 2019

Upper triangular solver #341

Merged

1 task

pratikvn force-pushed the upper-trs-cuda-kernels branch from a34d3d3 to 058f37e Compare September 1, 2019 09:36

pratikvn force-pushed the upper-trs-cuda-kernels branch 3 times, most recently from 94a7e84 to ff77fb1 Compare September 6, 2019 22:30

pratikvn added 5 commits September 10, 2019 22:48

Add CUDA kernels for upper trs.

492a2ab

Add CUDA kernel with update for dummy rhs.

981e22e

Fix the multiple RHS issue and also some CUDA VERSION check.

a267844

Update with init struct and transpose checking.

f31d5d5

+ Move init struct to UpperTrs class.

Update with the new upstream changes.

abad1f3

pratikvn force-pushed the upper-trs-cuda-kernels branch from ff77fb1 to abad1f3 Compare September 10, 2019 20:57

pratikvn removed the 1:ST:do-not-merge Please do not merge PR this yet. label Sep 10, 2019

tcojean requested changes Sep 13, 2019

View reviewed changes

cuda/solver/upper_trs_kernels.cu Outdated Show resolved Hide resolved

cuda/solver/upper_trs_kernels.cu Outdated Show resolved Hide resolved

cuda/solver/upper_trs_kernels.cu Outdated Show resolved Hide resolved

cuda/solver/upper_trs_kernels.cu Outdated Show resolved Hide resolved

pratikvn force-pushed the upper-trs-cuda-kernels branch 3 times, most recently from 4a237d9 to bdf617a Compare September 14, 2019 16:46

Add a pointer mode guard.

45594da

+ Adds automatic setting and resetting of the {CULIBS}_POINTER_MODE from HOST to DEVICE + Adds the pointer_mode_guards to dense kernels and cuda_linops in benchmarks as well.

pratikvn force-pushed the upper-trs-cuda-kernels branch from bdf617a to 45594da Compare September 14, 2019 17:40

Satisfy sonarqube's rule of 5 for {cuda}X_guards.

a83c60c

yhmtsai requested changes Sep 16, 2019

View reviewed changes

cuda/matrix/csr_kernels.cu Show resolved Hide resolved

cuda/solver/lower_trs_kernels.cu Outdated Show resolved Hide resolved

pratikvn force-pushed the upper-trs-cuda-kernels branch from b890a74 to 5b08ab0 Compare September 16, 2019 22:19

yhmtsai approved these changes Sep 17, 2019

View reviewed changes

cuda/solver/common_trs_kernels.cuh Show resolved Hide resolved

pratikvn requested a review from tcojean September 17, 2019 14:51

tcojean approved these changes Sep 17, 2019

View reviewed changes

cuda/base/pointer_mode_guard.hpp Outdated Show resolved Hide resolved

pratikvn force-pushed the upper-trs-cuda-kernels branch from 5b08ab0 to acc1fe7 Compare September 17, 2019 20:25

Review update: rem code duplication, update YML file.

01eadd0

+ Remove code duplication in cuda kernels by moving common code to a .cuh file. + Update the artifacts uploading in the YML file to circumvent the GITLAB limits.

pratikvn force-pushed the upper-trs-cuda-kernels branch from acc1fe7 to 01eadd0 Compare September 17, 2019 20:32

thoasm requested changes Sep 18, 2019

View reviewed changes

thoasm approved these changes Sep 18, 2019

View reviewed changes

cuda/base/pointer_mode_guard.hpp Outdated Show resolved Hide resolved

Review update: update documentation and formatting.

3517023

pratikvn force-pushed the upper-trs-cuda-kernels branch from 0c8a98f to 3517023 Compare September 18, 2019 12:24

pratikvn merged commit 87181c1 into develop Sep 18, 2019

pratikvn deleted the upper-trs-cuda-kernels branch September 18, 2019 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda kernels for the upper triangular solver #342

Cuda kernels for the upper triangular solver #342

pratikvn commented Aug 31, 2019 •

edited

Loading

codecov bot commented Sep 1, 2019 •

edited

Loading

tcojean left a comment

yhmtsai left a comment

yhmtsai left a comment

tcojean left a comment

thoasm left a comment

thoasm left a comment

Cuda kernels for the upper triangular solver #342

Cuda kernels for the upper triangular solver #342

Conversation

pratikvn commented Aug 31, 2019 • edited Loading

TODO

codecov bot commented Sep 1, 2019 • edited Loading

Codecov Report

tcojean left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

tcojean left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

pratikvn commented Aug 31, 2019 •

edited

Loading

codecov bot commented Sep 1, 2019 •

edited

Loading