-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel ILU factorization with a reference implementation #305
Conversation
Is the CI failing because it excesses the time limit? |
No, it fails because the test I think adding a |
9537775
to
2b27fc7
Compare
@thoasm for some reason there are quite a bit of conflicts of the CSR files (with the Hybrid conversions?) so you will have to rebase. |
@tcojean I know, I am not done yet with the implementation of the |
40d9859
to
a75eb2e
Compare
This PR can finally be reviewed. All tests should succeed. In addition to the |
Unfortunately, we get a Edit: I tried the in SO suggested
|
4dc6f51
to
89cf145
Compare
Codecov Report
@@ Coverage Diff @@
## develop #305 +/- ##
===========================================
- Coverage 98.17% 98.17% -0.01%
===========================================
Files 215 223 +8
Lines 16529 17180 +651
===========================================
+ Hits 16227 16866 +639
- Misses 302 314 +12
Continue to review full report at Codecov.
|
@tcojean I am currently failing SonarCloud because I have Also, the code coverage is still not perfect apparently, but this is due to the exact same reason: I am not testing the stubs. Additionally, it detects the two |
ParIlu factorization is now inheriting from `Composition` instead from `LinOpFactory` (or similar). Additionally, all added files are now also mentioned inside the appropriate `CMakeLists.txt`.
- Added `get_l` and `get_u_factor` to ParIluFactors - Fixed compiler errors for reference implementation - Added simple test for reference implementation
- Renamed class to `ParIlu` (and the files accordingly) - Added `iterations` variable to `ParIlu` to set the number of iterations of the compute kernel - Extended both `core` and `reference` test - optimized conversions (`system_matrix` -> Csr, Coo) by reducing the number of copies
- Added apply test for ParIlu - Made all test matrices const - Added additional reference tests for the 3 kernels - Fixed wrong include file in `common_kernels` - Fixed a mistake in the compute_l_u kernel (still not correct) - Added comments to compute_l_u kernel - Added debug output (MUST BE REMOVED BEFORE MERGING)
- All debug output was removed - More tests were added (to cover most of the corner cases) - Documentation was extended
Sorting of Csr matrices is required for ParIlu to work with any kind of matrix input. Currently, only the reference implementation is done. It is performed with `std::sort` with a custom Iterator. That helps to prevent both unnecessary copies and the need to implement a custom sorting algorithm.
- Moved (and renamed) the former CustomIterator (now IteratorFactory) to a separate file - Added documentation to IteratorFactory - Added tests for IteratorFactory - Added tests for sorting in CSR - Added sorting to ParIlu, so all tests succeed
The core test for ParIlu used kernels from the reference module, which are not required to exist for the core tests. All relevant tests were moved to the reference test.
Instead of having the same default value on all executors, it now defaults to `0`, which means the implementation can freely choose how many iterations are appropriate for the given ressources.
- Added used LaTeX packages into `Doxyfile-usr.in` - Improved code coverage by adding tests for + `IteratorFactory` (specifically for the `operator<` in Reference) + `ParIlu` (Added additional test with sorted CSR matrix) - re-added the overview description for `ParIlu`
72dbf19
to
e8108e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some minor documentation and include issues and other improvement suggestions.
I think you should also add some files for the documentation, namely a file in doc/headers
which defines a group factor
(see linop.hpp as example), and in addition, an update to the modules.dot
at the same time (in fact is this used @pratikvn, I can't remember?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor and nits. One possibly major: Is the factorization of big_mtx in the reference test correct ? I am getting a different result in MATLAB, but it is possible that I entered the matrix incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments should now be dealt with. I also added doc/headers/factor.hpp
and inserted factor
to doc/headers/modules.dot
(a very short description, but I am not sure what else I should have wrote there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
404cc14
to
e9e876b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things
For the CI system, we can add to - export CUDA_VISIBLE_DEVICES=0 This should allow only the usage of the first GPU for all job. But we have a test such as sending data from a CUDA executor to another, in which case that will happen on the same GPU instead of on the two distinct GPUs. |
To prevent conflicts with other users on the CI system, we now restrict it to only use the first GPU (device ID 0) for all tests. Note: That also restricts the CUDA executor copy test to a single GPU, meaning data will be copied internally on a single GPU instead of across devices.
The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + Windows + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or CygWin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) + New factorization support in Ginkgo, and addition of the ParILU algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324)) + New ILU preconditioner ([#348](#348), [#353](#353)) + Windows MinGW and Cygwin support ([#347](#347)) + Windows Visual studio support ([#351](#351)) + New example showing how to use ParILU as a preconditioner ([#358](#358)) + New example on using loggers for debugging ([#360](#360)) + Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306)) + Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303)) + New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317)) + Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310)) + Support for sorting rows in the CSR format by column idices ([#322](#322)) + Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345)) + Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) ([#334](#334)) + New sparsity matrix representation format with Reference and OpenMP kernels ([#349](#349), [#350](#350)) Fixes: + Accelerate GMRES solver for CUDA executor ([#363](#363)) + Fix BiCGSTAB solver convergence ([#359](#359)) + Fix CGS logging by reporting the residual for every sub iteration ([#328](#328)) + Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295)) + Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318)) + Fixed slowdown of COO SpMV on OpenMP ([#340](#340)) + Fix gcc 6.4.0 internal compiler error ([#316](#316)) + Fix compilation issue on Apple clang++ 10 ([#322](#322)) + Make Ginkgo able to compile on Intel 2017 and above ([#337](#337)) + Make the benchmarks spmv/solver use the same matrix formats ([#366](#366)) + Fix self-written isfinite function ([#348](#348)) + Fix Jacobi issues shown by cuda-memcheck Tools and ecosystem: + Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365)) + Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361)) + Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309)) + Add clang-tidy and iwyu support to Ginkgo ([#298](#298)) + Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments to CMake ([#300](#300)) + Add support for the xSDK R7 policy ([#325](#325)) + Fix examples in html documentation ([#367](#367))
The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.1.0. This release brings several performance improvements, adds Windows support, adds support for factorizations inside Ginkgo and a new ILU preconditioner based on ParILU algorithm, among other things. For detailed information, check the respective issue. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). ### Additions + Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) + New factorization support in Ginkgo, and addition of the ParILU algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324)) + New ILU preconditioner ([#348](#348), [#353](#353)) + Windows MinGW and Cygwin support ([#347](#347)) + Windows Visual Studio support ([#351](#351)) + New example showing how to use ParILU as a preconditioner ([#358](#358)) + New example on using loggers for debugging ([#360](#360)) + Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306)) + Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303)) + New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317)) + Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310)) + Support for sorting rows in the CSR format by column idices ([#322](#322)) + Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345)) + Addition of a LinOp to handle perturbations of the form (identity + scalar * basis * projector) ([#334](#334)) + New sparsity matrix representation format with Reference and OpenMP kernels ([#349](#349), [#350](#350)) ### Fixes + Accelerate GMRES solver for CUDA executor ([#363](#363)) + Fix BiCGSTAB solver convergence ([#359](#359)) + Fix CGS logging by reporting the residual for every sub iteration ([#328](#328)) + Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295)) + Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318)) + Fixed slowdown of COO SpMV on OpenMP ([#340](#340)) + Fix gcc 6.4.0 internal compiler error ([#316](#316)) + Fix compilation issue on Apple clang++ 10 ([#322](#322)) + Make Ginkgo able to compile on Intel 2017 and above ([#337](#337)) + Make the benchmarks spmv/solver use the same matrix formats ([#366](#366)) + Fix self-written isfinite function ([#348](#348)) + Fix Jacobi issues shown by cuda-memcheck ### Tools and ecosystem improvements + Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365)) + Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361)) + Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309)) + Add clang-tidy and iwyu support to Ginkgo ([#298](#298)) + Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments to CMake ([#300](#300)) + Add support for the xSDK R7 policy ([#325](#325)) + Fix examples in html documentation ([#367](#367)) Related PR: #370
This PR adds a parallel incomplete LU factorization to Ginkgo. I did not completely follow the example of #27 because:
::build().with_iterations(X).on(exec)
)Composition
(see the testApplyMethodDenseSmall
, whereapply
is called)For the last point it is possible for the user to change the content of L and U by calling
copy_from()
, so it is not 100% guaranteed that the matrices are actually triangular matrices (or that there are two of them).closes #134, closes #27
Additionally, as part of this PR, a
sort_by_column_index()
method was added to theCsr
matrix because the current 'ParIlu' implementation only works if the given system matrix is sorted. A reference implementation is provided which works with the custom made iteratorIteratorFactory::Iterator
, so no array has to be copied to work withstd::sort
.It should be possible to also use this iterator to implement the OpenMP version (sort chunks in parallel, followed by a merge step), although a custom sort implementation might be faster.
TODO before merging: