-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/mrhs solvers #1489
Feature/mrhs solvers #1489
Conversation
… size to use when generating null-space vectors. This is the parameter used to enable MRHS null space generation. Updated the null-space generation to work on vectors of this width
…-> scalar cast operator explicit instead of implicit
…ch - will be needed for batched CG
… issues at compile time, the scalar -> vector<scalar> cast operator is now explicit. Apply necessary changes to ensure that stuff doesn't break with this change
…of invertQuda to new function solve which is MRHS aware
… also now called by the invertMultiSrcQuda interface
…nd writes into atoms. This fixes an issue with the MR-based Schwarz solver when using heterogenous reductions with batched solves
This PR is now functionally complete, and all tests are passing. This is ready for final review (@weinbe2 @hummingtree @mathiaswagner @bjoo). |
lib/inv_bicgstab_quda.cpp
Outdated
@@ -371,10 +370,12 @@ namespace quda { | |||
|
|||
if (!param.is_preconditioner) { // do not do the below if we this is an inner solver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of typo fixing, if we this is
-> if this is
I have tested the batch CG solver with a modified version of MILC that properly utilizes the multisource MILC interface function. This is available here: https://github.com/lattice/milc_qcd/tree/feature/quda-block-solver-interface ; current commit is lattice/milc_qcd@f0404fe . This PR works perfectly fine with the current I will note that this has only tested vanilla CG. I have not yet plumbed in multi-rhs support for the MG solver; I consider that within the scope of a second QUDA PR. |
When using coarsest-level deflation (perhaps just with staggered operators?) it looks like we need to change the default values corresponding to the flag
I compiled with MG MRHS support for 16 and 32; I can't think of anywhere that I've imposed a multiple of 8. The solution was explicitly setting My (relatively reduced) command is:
Neither toggling I can't quite think of a good way to address this (yet), but I'm also not clear on the details in the weeds. Maybe you know exactly where the fix is @maddyscientist ? |
Ok, I understand this issue. There's two things at play here:
Perhaps it would also be a good idea to have fall back to non-MMA dslash if the requested size isn't available? That would make things more bullet proof? Perhaps with a warning on first call? |
…rrect level. Default test batch size for eigenvalue computation is now 16 (to match the default mma nvec instantiation
…his. Remove duplicate code
…eature/mrhs-solvers
…into feature/mrhs-solvers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has passed my visual review and my tests with MILC. Awesome PR @maddyscientist !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very cursory look. I haven't had a chance to test the very latest with Chroma, but one or two commits out I think we fixed all the Chroma issues. So I am happy to approve. This is a great change.
…eature/mrhs-solvers
This is PR is a biggie:
QudaMultigridParam::n_vec_batch
invertMultiSrcQuda
interfaceQudaInvertParam::true_res
andQudaInvertParam::true_res_hq
are now arraysDirac::prepare
andDirac::reconstruct
functions are now MRHS optimizedcvector<T>
toT
is now explicit instead of implicitDslashCoarse
is now robust to underflowMPI_THREAD_FUNNELED
Things left to do