Feature/batched quda deflation #76

leonhostetler · 2024-12-22T14:41:15Z

This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU.

Key points:

To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true
QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves
Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed
Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed
This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git

More details:

Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run.

This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future.

Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver.

The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated.

In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run.

One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter.

When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable.

Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1.

In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like:

max_number_of_eigenpairs 512 		# How many eigenvectors to use for deflation
tol_restart 1e-2 			# How often to do the redeflation

When loading eigenvectors from file, use a parameter block like:

reload_parallel_ks_eigen [filename]	# This works with both single file and partfile formats
file_number_of_eigenpairs 512		# In case the file has more eigenvectors than will be used to deflate
forget_ks_eigen 			# Don't save the eigenvectors to file

Alternatively, when generating fresh eigenvectors, use a parameter block like:

fresh_ks_eigen				# Run QUDA's eigensolver
save_partfile_ks_eigen [filename] 	# Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard
Max_Lanczos_restart_iters 1000		# Max number of Lanczos restart iterations
eigenval_tolerance 1e-12		# Eigenvalue tolerance
Lanczos_max 1024			# Size of Krylov space, corresponds to QUDA's n_kr
Lanczos_restart 1000			# Deprecated, does nothing as far as I can tell
eigensolver_prec 1			# Precision in eigensolver, double=2, single=1, half=0
batched_rotate 20			# Size of batch_rotate
Chebyshev_alpha 0.1			# Must be larger than 4*m^2 for largest quark mass that will be deflated
Chebyshev_beta 0			# Leave at 0 for QUDA to estimate internally
Chebyshev_order 100			# Chebyshev order
block_size 1				# block_size>1 implies block TRLM (doesn't work well yet?)

Also, don't forget to set

deflate yes/no

in the propagator stanzas.

stevengottlieb · 2024-12-22T15:35:24Z

This looks great, Leon. Your explanation of the details is super. Thanks, Steve On Dec 22, 2024, at 9:41 AM, Leon Hostetler ***@***.***> wrote: This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU. Key points: 1. To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true 2. QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves 3. Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed 4. Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed 5. This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git More details: Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run. This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future. Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver. The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated. In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run. One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter. When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable. Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1. In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like: max_number_of_eigenpairs 512 # How many eigenvectors to use for deflation tol_restart 1e-2 # How often to do the redeflation When loading eigenvectors from file, use a parameter block like: reload_parallel_ks_eigen [filename] # This works with both single file and partfile formats file_number_of_eigenpairs 512 # In case the file has more eigenvectors than will be used to deflate forget_ks_eigen # Don't save the eigenvectors to file Alternatively, when generating fresh eigenvectors, use a parameter block like: fresh_ks_eigen # Run QUDA's eigensolver save_partfile_ks_eigen [filename] # Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard Max_Lanczos_restart_iters 1000 # Max number of Lanczos restart iterations eigenval_tolerance 1e-12 # Eigenvalue tolerance Lanczos_max 1024 # Size of Krylov space, corresponds to QUDA's n_kr Lanczos_restart 1000 # Deprecated, does nothing as far as I can tell eigensolver_prec 1 # Precision in eigensolver, double=2, single=1, half=0 batched_rotate 20 # Size of batch_rotate Chebyshev_alpha 0.1 # Must be larger than 4*m^2 for largest quark mass that will be deflated Chebyshev_beta 0 # Leave at 0 for QUDA to estimate internally Chebyshev_order 100 # Chebyshev order block_size 1 # block_size>1 implies block TRLM (doesn't work well yet?) Also, don't forget to set deflate yes/no in the propagator stanzas.

…

detar · 2025-01-18T02:24:09Z

What eigenvector file formats are supported by QUDA at the moment? We have a lot of eigenvectors in the Grid/Hadrons eigenpack format. The MILC code can read them.

leonhostetler · 2025-01-18T12:07:00Z

@detar, as far as I know, QUDA is unable to read Grid/Hadrons eigenpack format. I am tagging Evan @weinbe2 to confirm.

Context: For this feature (QUDA batched deflation for MILC), the eigenvectors are loaded directly by QUDA. Only the filename is passed from MILC. The benefit of this approach is interface simplicity. The drawback is of course that we are limited to the formats that QUDA is able to load.

detar · 2025-01-18T21:04:16Z

@weinbe2 says QUDA reads only QUDA format. An important use case for the disconnected HVP at 0.06 fm is to be able to read Grid/Hadron eigenpacks and do deflation and batch solves with QUDA. There is also a different MILC eigenvector file format. For the deflated solves, we would want to define WANT_QUDA, WANT_FN_CG_GPU. Do we also define WANT_EIG_GPU? If so, this pull request would not support that use case. Ideally, we would modify QUDA to read the other formats. But for now, would we need another macro that would say QUDA should do the reading? Say, WANT_EIG_IO_GPU? Also, does the PR still allow the CG interfaces to support slurping up the MILC eigenvectors and using them to deflate the QUDA solves?

leonhostetler · 2025-01-20T21:39:01Z

@detar

An important use case for the disconnected HVP at 0.06 fm is to be able to read Grid/Hadron eigenpacks and do deflation and batch solves with QUDA. There is also a different MILC eigenvector file format.

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

For the deflated solves, we would want to define WANT_QUDA, WANT_FN_CG_GPU. Do we also define WANT_EIG_GPU?

With this PR, WANT_QUDA and WANT_FN_CG_GPU must be defined to use QUDA deflation. Furthermore, WANT_EIG_GPU is needed if requesting FRESH eigenvectors. This PR would not support the combination of a non-QUDA eigensolve with a QUDA deflate in the same run.

But for now, would we need another macro that would say QUDA should do the reading? Say, WANT_EIG_IO_GPU?

QUDA automatically reads in the eigenvectors if it is given a filename in the QudaEigParam struct that is passed to the inverter. If QUDA reading is disabled by passing in an empty filename, then it will automatically do the eigensolve.

Also, does the PR still allow the CG interfaces to support slurping up the MILC eigenvectors and using them to deflate the QUDA solves?

This would no longer be supported under this PR. Prior to this PR, with WANT_FN_CG_GPU defined, deflation was done on CPU and the CG solves were done via QUDA. Testing showed dramatically improved performance when the deflation was also shifted to QUDA. With this PR, deflation essentially becomes part of the CG solve, so if WANT_FN_CG_GPU is defined, deflation and CG are both done in QUDA. Do we still want CPU deflation with QUDA solves?

weinbe2 · 2025-01-21T14:53:54Z

@detar @leonhostetler if you have some documentation on the data layout of the MILC eigenvector formats, or can point me to the relevant BSD file in the MILC source, I will certainly add support for at least reading them to unblock your workflow. (I'll also get to writing them, it'd just be lower on the stack.)

As for Grid format vectors, similarly, @paboyle, if you have a document/note that describes the data layout I can also work on getting that implemented.

maddyscientist · 2025-01-21T18:49:33Z

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

@leonhostetler just noting QUDA can trivially switch to odd-site deflation, so it shouldn't be a problem if that's preferable. QUDA should be completely agnostic as to which parity is used, and it's a runtime switch in the interface.

…new QUDA

weinbe2 · 2025-01-27T19:49:24Z

This PR only supports QUDA format eigenvector files for QUDA deflation. One reason for this was interface simplicity in that only the filename needs to be passed from MILC to QUDA. If we want to load the eigenvectors in MILC, thereby adding support for all the formats that MILC supports, then we have several issues to resolve. The first is that QUDA only does even-site deflation, and expects the eigenvectors to be in single parity format. So Grid eigenpacks would need to be converted from odd parity to even parity prior to passing to QUDA. The host memory footprint is another issue. Can we convert other eigenvector formats to QUDA format without at least temporarily doubling the memory footprint? A related issue is how to pass the eigenvectors from MILC (where space is malloced for them) to QUDA (where they are stored in std::vector) without doubling the host memory footprint.

@leonhostetler just noting QUDA can trivially switch to odd-site deflation, so it shouldn't be a problem if that's preferable. QUDA should be completely agnostic as to which parity is used, and it's a runtime switch in the interface.

@maddyscientist I suggested in the QUDA PR that we punt on this until a follow-up PR, I fully agree that there's not going to be any fundamental headache getting it going, I'd just like to get something initial and reasonable merged.

leonhostetler · 2025-01-28T20:44:43Z

@maddyscientist @weinbe2 I think it makes sense to punt support for other file formats to a follow-up QUDA PR. However, for this (MILC) PR, I don't know if @detar wants to merge this into MILC develop before then given that the current PR would cut off support for CPU deflation of MILC/Grid evectors when using the QUDA CG solver. We could modify the PR to maintain that support or wait to merge this MILC PR until MILC and Grid evector formats are supported by QUDA which would obsolete the CPU deflation. Hence, I think it makes sense to still try to answer @weinbe2's question (finally):

@detar @leonhostetler if you have some documentation on the data layout of the MILC eigenvector formats, or can point me to the relevant BSD file in the MILC source, I will certainly add support for at least reading them to unblock your workflow. (I'll also get to writing them, it'd just be lower on the stack.)

As for Grid format vectors, similarly, @paboyle, if you have a document/note that describes the data layout I can also work on getting that implemented.

In MILC, eigenvectors are loaded by a call to reload_ks_eigen.

If it is a Grid eigenpack, then reload_grid_ks_eigenpack_dir is called, which loads the eigenvector files one at a time--calling read_grid_ks_eigenvector for each one.
If they are not Grid eigenvectors, then reload_ks_eigen_file is called
- If they are in MILC format, then read_ks_eigenvector is called
- If they are in QUDA format, then read_quda_ks_eigenvectors is called

leonhostetler added 6 commits December 14, 2024 15:37

Added QUDA deflation for UML, CG, and CGZ for single right-hand side

10c485f

Fixed fresh and save options for QUDA eigenvectors

edcfd92

Updated some interfacing with quda

2535fbf

Fixed deflate savebuf was overwriting mass savebuf

eb301ba

QUDA batched deflation for multiple right hand sides

7a6d501

Added to input parameters

33e58df

leonhostetler mentioned this pull request Dec 22, 2024

MILC batched deflation lattice/quda#1529

Merged

Restructured to preserve backward compatibility between old MILC and …

7de9cda

…new QUDA

detar merged commit ea2b0a8 into milc-qcd:develop Jan 28, 2025

This was referenced Feb 7, 2025

Feature/batched quda deflation #80

Open

Temporarily revert changes to QUDA solver interface #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/batched quda deflation #76

Feature/batched quda deflation #76

leonhostetler commented Dec 22, 2024

stevengottlieb commented Dec 22, 2024 via email

detar commented Jan 18, 2025

leonhostetler commented Jan 18, 2025

detar commented Jan 18, 2025

leonhostetler commented Jan 20, 2025

weinbe2 commented Jan 21, 2025

maddyscientist commented Jan 21, 2025

weinbe2 commented Jan 27, 2025

leonhostetler commented Jan 28, 2025

Feature/batched quda deflation #76

Feature/batched quda deflation #76

Conversation

leonhostetler commented Dec 22, 2024

stevengottlieb commented Dec 22, 2024 via email

detar commented Jan 18, 2025

leonhostetler commented Jan 18, 2025

detar commented Jan 18, 2025

leonhostetler commented Jan 20, 2025

weinbe2 commented Jan 21, 2025

maddyscientist commented Jan 21, 2025

weinbe2 commented Jan 27, 2025

leonhostetler commented Jan 28, 2025