Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/batched quda deflation #76

Merged
merged 7 commits into from
Jan 28, 2025

Conversation

leonhostetler
Copy link
Collaborator

This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU.

Key points:

  1. To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true
  2. QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves
  3. Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed
  4. Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed
  5. This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git

More details:

Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run.

This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future.

Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver.

The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated.

In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run.

One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter.

When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable.

Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1.

In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like:

max_number_of_eigenpairs 512 		# How many eigenvectors to use for deflation
tol_restart 1e-2 			# How often to do the redeflation

When loading eigenvectors from file, use a parameter block like:

reload_parallel_ks_eigen [filename]	# This works with both single file and partfile formats
file_number_of_eigenpairs 512		# In case the file has more eigenvectors than will be used to deflate
forget_ks_eigen 			# Don't save the eigenvectors to file

Alternatively, when generating fresh eigenvectors, use a parameter block like:

fresh_ks_eigen				# Run QUDA's eigensolver
save_partfile_ks_eigen [filename] 	# Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard
Max_Lanczos_restart_iters 1000		# Max number of Lanczos restart iterations
eigenval_tolerance 1e-12		# Eigenvalue tolerance
Lanczos_max 1024			# Size of Krylov space, corresponds to QUDA's n_kr
Lanczos_restart 1000			# Deprecated, does nothing as far as I can tell
eigensolver_prec 1			# Precision in eigensolver, double=2, single=1, half=0
batched_rotate 20			# Size of batch_rotate
Chebyshev_alpha 0.1			# Must be larger than 4*m^2 for largest quark mass that will be deflated
Chebyshev_beta 0			# Leave at 0 for QUDA to estimate internally
Chebyshev_order 100			# Chebyshev order
block_size 1				# block_size>1 implies block TRLM (doesn't work well yet?)

Also, don't forget to set

deflate yes/no

in the propagator stanzas.

@stevengottlieb
Copy link
Collaborator

stevengottlieb commented Dec 22, 2024 via email

@detar
Copy link
Contributor

detar commented Jan 18, 2025

What eigenvector file formats are supported by QUDA at the moment? We have a lot of eigenvectors in the Grid/Hadrons eigenpack format. The MILC code can read them.

@leonhostetler
Copy link
Collaborator Author

@detar, as far as I know, QUDA is unable to read Grid/Hadrons eigenpack format. I am tagging Evan @weinbe2 to confirm.

Context: For this feature (QUDA batched deflation for MILC), the eigenvectors are loaded directly by QUDA. Only the filename is passed from MILC. The benefit of this approach is interface simplicity. The drawback is of course that we are limited to the formats that QUDA is able to load.

@detar
Copy link
Contributor

detar commented Jan 18, 2025

@weinbe2 says QUDA reads only QUDA format. An important use case for the disconnected HVP at 0.06 fm is to be able to read Grid/Hadron eigenpacks and do deflation and batch solves with QUDA. There is also a different MILC eigenvector file format. For the deflated solves, we would want to define WANT_QUDA, WANT_FN_CG_GPU. Do we also define WANT_EIG_GPU? If so, this pull request would not support that use case. Ideally, we would modify QUDA to read the other formats. But for now, would we need another macro that would say QUDA should do the reading? Say, WANT_EIG_IO_GPU? Also, does the PR still allow the CG interfaces to support slurping up the MILC eigenvectors and using them to deflate the QUDA solves?

@leonhostetler
Copy link
Collaborator Author

@detar

An important use case for the disconnected HVP at 0.06 fm is to be able to read Grid/Hadron eigenpacks and do deflation and batch solves with QUDA. There is also a different MILC eigenvector file format.

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

For the deflated solves, we would want to define WANT_QUDA, WANT_FN_CG_GPU. Do we also define WANT_EIG_GPU?

With this PR, WANT_QUDA and WANT_FN_CG_GPU must be defined to use QUDA deflation. Furthermore, WANT_EIG_GPU is needed if requesting FRESH eigenvectors. This PR would not support the combination of a non-QUDA eigensolve with a QUDA deflate in the same run.

But for now, would we need another macro that would say QUDA should do the reading? Say, WANT_EIG_IO_GPU?

QUDA automatically reads in the eigenvectors if it is given a filename in the QudaEigParam struct that is passed to the inverter. If QUDA reading is disabled by passing in an empty filename, then it will automatically do the eigensolve.

Also, does the PR still allow the CG interfaces to support slurping up the MILC eigenvectors and using them to deflate the QUDA solves?

This would no longer be supported under this PR. Prior to this PR, with WANT_FN_CG_GPU defined, deflation was done on CPU and the CG solves were done via QUDA. Testing showed dramatically improved performance when the deflation was also shifted to QUDA. With this PR, deflation essentially becomes part of the CG solve, so if WANT_FN_CG_GPU is defined, deflation and CG are both done in QUDA. Do we still want CPU deflation with QUDA solves?

@weinbe2
Copy link
Contributor

weinbe2 commented Jan 21, 2025

@detar @leonhostetler if you have some documentation on the data layout of the MILC eigenvector formats, or can point me to the relevant BSD file in the MILC source, I will certainly add support for at least reading them to unblock your workflow. (I'll also get to writing them, it'd just be lower on the stack.)

As for Grid format vectors, similarly, @paboyle, if you have a document/note that describes the data layout I can also work on getting that implemented.

@maddyscientist
Copy link
Contributor

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

@leonhostetler just noting QUDA can trivially switch to odd-site deflation, so it shouldn't be a problem if that's preferable. QUDA should be completely agnostic as to which parity is used, and it's a runtime switch in the interface.

@weinbe2
Copy link
Contributor

weinbe2 commented Jan 27, 2025

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

@leonhostetler just noting QUDA can trivially switch to odd-site deflation, so it shouldn't be a problem if that's preferable. QUDA should be completely agnostic as to which parity is used, and it's a runtime switch in the interface.

@maddyscientist I suggested in the QUDA PR that we punt on this until a follow-up PR, I fully agree that there's not going to be any fundamental headache getting it going, I'd just like to get something initial and reasonable merged.

@leonhostetler
Copy link
Collaborator Author

@maddyscientist @weinbe2 I think it makes sense to punt support for other file formats to a follow-up QUDA PR. However, for this (MILC) PR, I don't know if @detar wants to merge this into MILC develop before then given that the current PR would cut off support for CPU deflation of MILC/Grid evectors when using the QUDA CG solver. We could modify the PR to maintain that support or wait to merge this MILC PR until MILC and Grid evector formats are supported by QUDA which would obsolete the CPU deflation. Hence, I think it makes sense to still try to answer @weinbe2's question (finally):

@detar @leonhostetler if you have some documentation on the data layout of the MILC eigenvector formats, or can point me to the relevant BSD file in the MILC source, I will certainly add support for at least reading them to unblock your workflow. (I'll also get to writing them, it'd just be lower on the stack.)

As for Grid format vectors, similarly, @paboyle, if you have a document/note that describes the data layout I can also work on getting that implemented.

In MILC, eigenvectors are loaded by a call to reload_ks_eigen.

@detar detar merged commit ea2b0a8 into milc-qcd:develop Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants