-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed Precision Gmres #640
Conversation
First, nice that you have several examples. Quick question: do you think it makes sense to split this PR into one purely for the accessor, and another one using the accessor in the mixed precision GMRES? 8.5K lines is a bit much, though we've been having multiple such PR recently. |
c786679
to
5a7be65
Compare
@tcojean thanks for the suggestion, it would definitively make sense. |
Yes I guess we could also split the GMRES itself into at least Reference/base structure and then a PR for all the kernels. I don't think we would have to split the kernels PR though into OpenMP and another one HIP/CUDA, but we'll see depending on the size of each PR. |
4f5bb93
to
e10234b
Compare
* Only core and reference executors. * test files don't compile, due to t problem related the macros of gtest (TEST_F -> TYPED_TEST). * MGS, MGS with reorthogonalization and CGS with reorthogonalization are considered. * Norms are still created in the internal routines.
* Now, the norms are properly created in the main class. * The test files are not repaired yet.
* For CGS, a loop of kernels is used instead of a kernel with a loop. * The test files are not repaired yet.
* The messages have to be removed. * For CGS, a loop of kernels is used instead of a kernel with a loop.
* For CGS, a loop of kernels is used instead of a kernel with a loop. * Consider another base_types for ValueTypeKrylovBases
…some errors which were detected during the testing process in the repository. The previous value was float whereas the original results were executed by default_precision, and these are the reason of the errors. Now, the default value is also default_precision.
…nd cuda executors, as a first step in the optimization process. Also the calls for the timing are included.
…cuda executors. For omp, the omp is trying to move to the outer loop For cuda, the loop of kernels is change to a kernels with a loop. * The main routines (loop of dots and loop of axpy) are still too expensive.
…done. Also timing instructions are included, whose management is made by some define's. The next step will be to improve the update kernels.
Added an accessor header file (name might have to change in future) and used it in all mixed precision kernels (but for now only for the reduced precision accesses). Also adds some minor fixes: - removed unused code in the example in hopes that it compiles on windows - added HIP stubs to allow HIP compilation
… close to 75s for 6221 iters. Next steps should be: * Add the computation of the inf-norm for the next_krylov_basis. * Merge updating and norms computations.
The specialization is currently set to only work with float storage type to test the pipeline, but it can easily be modified to work with all integer types. The Accessor was also moved from a shared header to a gmres_mixed exclusive header.
Also add instantiation macro for ConstAccessors
Currently, only core is adapted with the reference test started (not all precision combinations are tested properly).
Also CUDA and OpenMP compiles now for the new accessor layout. Benchmarks is still TODO.
Also add instantiation for single precision floating point
Also adjust test precision to be more accurate.
Make GmresMixed reference test work on CI.
Make accessor references work with older CUDA versions by having a conditional constexpr qualifier and by forcing it to use the overloaded cast operation (when present).
e10234b
to
d3487c0
Compare
091a666
to
9ad48aa
Compare
Error: The following files need to be formatted:
You can find a formatting patch under Artifacts here or run |
This PR adds the compressed basis GMRES.
TODO: