Add isContiguous check for new AmgX API #26
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The new AmgX API allows passing in partition offsets instead of a full partition vector.
Perform this check using the PETSc index set API to transparently enable the optimization.
I verified that the optimization works on the poisson example by doing the following:
ompi_gcc_opt
):N=400 time -p mpirun -np 4 -x N \ bash -c "bin/poisson -caseName Test -mode AmgX_GPU -cfgFileName configs/AmgX_SolverOptions_AGG.info -Nx \$N -Ny \$N -Nz \$N -Nruns 1 -optFileName test"
grep -i solving test.log -A2 | awk '{printf("%-10s%s\n",$1, $4)}'
. (the command just filters out the runtime in seconds of the regions of interest)solve
call. Scaling is not perfect as this matrix is still somewhat small, just to be able to run a quick benchmark on a single node for this PR - I've tested this separately on multi-node GPU with larger matrix sizes, too.I also verified that using
N_ranks > N_gpus
still works, uses the optimized path (i.e. indices are contiguous) and gets a slight speedup, though overall runtime increased (presumably due to the consolidation overhead).