Mixed Precision Gmres #640

thoasm · 2020-09-14T07:18:11Z

This PR adds the compressed basis GMRES.

TODO:

Merge Add new accessor #643
Update accessor use to the merged version

tcojean · 2020-09-14T07:22:34Z

First, nice that you have several examples. Quick question: do you think it makes sense to split this PR into one purely for the accessor, and another one using the accessor in the mixed precision GMRES? 8.5K lines is a bit much, though we've been having multiple such PR recently.

thoasm · 2020-09-16T11:21:06Z

@tcojean thanks for the suggestion, it would definitively make sense.
I will look into extracting the accessor itself, so it is easier to discuss the changes.
It might also make sense to split the GMRES into a reference (maybe OpenMP as well) PR and a CUDA / HIP PullRequest, which should reduce the sizes as well.

tcojean · 2020-09-17T07:47:04Z

Yes I guess we could also split the GMRES itself into at least Reference/base structure and then a PR for all the kernels. I don't think we would have to split the kernels PR though into OpenMP and another one HIP/CUDA, but we'll see depending on the size of each PR.

* Only core and reference executors. * test files don't compile, due to t problem related the macros of gtest (TEST_F -> TYPED_TEST). * MGS, MGS with reorthogonalization and CGS with reorthogonalization are considered. * Norms are still created in the internal routines.

* Now, the norms are properly created in the main class. * The test files are not repaired yet.

* For CGS, a loop of kernels is used instead of a kernel with a loop. * The test files are not repaired yet.

* The messages have to be removed. * For CGS, a loop of kernels is used instead of a kernel with a loop.

* For CGS, a loop of kernels is used instead of a kernel with a loop. * Consider another base_types for ValueTypeKrylovBases

…some errors which were detected during the testing process in the repository. The previous value was float whereas the original results were executed by default_precision, and these are the reason of the errors. Now, the default value is also default_precision.

…nd cuda executors, as a first step in the optimization process. Also the calls for the timing are included.

…cuda executors. For omp, the omp is trying to move to the outer loop For cuda, the loop of kernels is change to a kernels with a loop. * The main routines (loop of dots and loop of axpy) are still too expensive.

…done. Also timing instructions are included, whose management is made by some define's. The next step will be to improve the update kernels.

Added an accessor header file (name might have to change in future) and used it in all mixed precision kernels (but for now only for the reduced precision accesses). Also adds some minor fixes: - removed unused code in the example in hopes that it compiles on windows - added HIP stubs to allow HIP compilation

… close to 75s for 6221 iters. Next steps should be: * Add the computation of the inf-norm for the next_krylov_basis. * Merge updating and norms computations.

The specialization is currently set to only work with float storage type to test the pipeline, but it can easily be modified to work with all integer types. The Accessor was also moved from a shared header to a gmres_mixed exclusive header.

Also add instantiation macro for ConstAccessors

Currently, only core is adapted with the reference test started (not all precision combinations are tested properly).

Also CUDA and OpenMP compiles now for the new accessor layout. Benchmarks is still TODO.

Also add instantiation for single precision floating point

Also adjust test precision to be more accurate.

Make GmresMixed reference test work on CI.

Make accessor references work with older CUDA versions by having a conditional constexpr qualifier and by forcing it to use the overloaded cast operation (when present).

…:is_scalar

ginkgo-bot · 2020-12-12T08:36:59Z

Error: The following files need to be formatted:

benchmark/solver/solver.cpp
core/base/extended_float.hpp
core/base/utils.hpp
core/solver/gmres_mixed.cpp
core/solver/gmres_mixed_accessor.hpp
core/test/base/range_accessors.cpp
core/test/reorder/rcm.cpp
core/test/solver/gmres_mixed.cpp
include/ginkgo/core/base/range.hpp
include/ginkgo/core/base/range_accessors.hpp
include/ginkgo/core/base/types.hpp
include/ginkgo/core/reorder/reordering_base.hpp
include/ginkgo/core/solver/gmres_mixed.hpp
omp/solver/gmres_mixed_kernels.cpp
omp/test/reorder/rcm_kernels.cpp
omp/test/solver/gmres_mixed_kernels.cpp
reference/solver/gmres_mixed_kernels.cpp
reference/stop/residual_norm_kernels.cpp
reference/test/reorder/rcm.cpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

thoasm added type:solver This is related to the solvers 1:ST:WIP This PR is a work in progress. Not ready for review. mod:all This touches all Ginkgo modules. labels Sep 14, 2020

thoasm self-assigned this Sep 14, 2020

thoasm force-pushed the gmres_mixed_accessor branch 8 times, most recently from c786679 to 5a7be65 Compare September 15, 2020 04:15

thoasm force-pushed the gmres_mixed_accessor branch from 4f5bb93 to e10234b Compare September 17, 2020 10:15

aliaga@uji.es and others added 14 commits October 22, 2020 20:43

Inclusion of the omp executor in the repository:

5eb6bd7

* Now, the norms are properly created in the main class. * The test files are not repaired yet.

Inclusion of the cuda executor in the repository:

c90e9d8

* For CGS, a loop of kernels is used instead of a kernel with a loop. * The test files are not repaired yet.

The test files are finally included, but:

c7557d9

* The messages have to be removed. * For CGS, a loop of kernels is used instead of a kernel with a loop.

The first unoptimized version is done. Next tasks:

3606b68

* For CGS, a loop of kernels is used instead of a kernel with a loop. * Consider another base_types for ValueTypeKrylovBases

Definition of the CG2 variant of the finish_arnoldi routine for omp a…

ff91868

…nd cuda executors, as a first step in the optimization process. Also the calls for the timing are included.

Add GMRES_mixed to benchmark

7c2dacc

Finally a good implementation of the multidot_kernels_num_iters_1 is …

2b7f889

…done. Also timing instructions are included, whose management is made by some define's. The next step will be to improve the update kernels.

Made GmresMixed compile with complex types

6051cbf

The update routines have been improved. Now the computational time is…

03e1b4f

… close to 75s for 6221 iters. Next steps should be: * Add the computation of the inf-norm for the next_krylov_basis. * Merge updating and norms computations.

Thomas Grützmacher added 22 commits October 22, 2020 20:43

CUDA implementation is now using at

e490aba

Re-add ConstAccessor

56a63d5

Also add instantiation macro for ConstAccessors

Fix accessor by adding additional __restrict__

3fcaee7

GmresMixed storage prec is now a factory parameter

dd899cf

Currently, only core is adapted with the reference test started (not all precision combinations are tested properly).

Improve reference test and include the enum there

32770bf

Fix the reference test to pass

544782b

Also CUDA and OpenMP compiles now for the new accessor layout. Benchmarks is still TODO.

Adopt to new parameter macros

b5ddef7

Update the helper to throw when complex

5d2a106

Also add instantiation for single precision floating point

Make GmresMixed work properly with multiple RHS

56d1f90

Also adjust test precision to be more accurate.

Fix benchmark to work with new GmresMixed layout

f67b5b3

Make GmresMixed reference test work on CI.

Move GmresMixed accessor to range_accessors.hpp

8adf5d3

Make accessor range compatible

f67e5ee

Remove unnecessary code from CUDA GmresMixed

8b27cc4

Add HIP kernels

5247306

Half-way of integrating proper const support

6b47fef

Finish proper const-type support

928c8bd

Add constexpr everywhere in accessor

84238d5

Attempt to fix thrust::complex conversion issue

f0bbead

Add workaround for CUDA for reference casting

475d96f

Use better workaround for CUDA references

7f17f6d

Make accessor references work with older CUDA versions by having a conditional constexpr qualifier and by forcing it to use the overloaded cast operation (when present).

Fix GmresMixed core problem

1b3bb52

REORDER to first of this PR: Add specialization for gko::half in std:…

d3487c0

…:is_scalar

thoasm force-pushed the gmres_mixed_accessor branch from e10234b to d3487c0 Compare October 22, 2020 21:47

Add TODO list text-file

9ad48aa

thoasm force-pushed the gmres_mixed_accessor branch from 091a666 to 9ad48aa Compare December 4, 2020 09:49

Improve force-reset behavior

70e558d

thoasm closed this Jan 23, 2021

thoasm mentioned this pull request Jan 23, 2021

Add Compressed Basis GMRES (CB-GMRES) #693

Merged

thoasm deleted the gmres_mixed_accessor branch February 24, 2021 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed Precision Gmres #640

Mixed Precision Gmres #640

thoasm commented Sep 14, 2020 •

edited

Loading

tcojean commented Sep 14, 2020

thoasm commented Sep 16, 2020

tcojean commented Sep 17, 2020

ginkgo-bot commented Dec 12, 2020

Mixed Precision Gmres #640

Mixed Precision Gmres #640

Conversation

thoasm commented Sep 14, 2020 • edited Loading

tcojean commented Sep 14, 2020

thoasm commented Sep 16, 2020

tcojean commented Sep 17, 2020

ginkgo-bot commented Dec 12, 2020

thoasm commented Sep 14, 2020 •

edited

Loading