Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sptrsv supernode #552

Merged
merged 63 commits into from
Jan 13, 2020
Merged

Sptrsv supernode #552

merged 63 commits into from
Jan 13, 2020

Conversation

iyamazaki
Copy link
Contributor

This PR adds the options for supernodal-based sparse-triangular solve (with SuperLU or Cholmod) to the existing level-based scheduling sparse-triangular solve.

…tor, give run-time optitions for merge, metis)
create sptrsv_aux.hpp for some common function between SuperLU and Cholmod
…_superlu tester

NOTE: we store L in CSC (calling cusparse with UPPER+TRANSPOSE) and U in CSR (calling cusparse with UPPER+NoTRANSPOSE)
@ndellingwood
Copy link
Contributor

Running spot-check on kokkos-dev-2

@ndellingwood
Copy link
Contributor

Spot-check on kokkos-dev-2 passes:

Running on machine: kokkos-dev-2
Going to test compilers:  gcc/7.3.0 gcc/9.1 intel/18.0.5 clang/8.0 cuda/10.1
Testing compiler gcc/7.3.0
Testing compiler gcc/9.1
  Starting job gcc-7.3.0-Pthread-release
  Starting job gcc-7.3.0-OpenMP-release
  PASSED gcc-7.3.0-Pthread-release
  Starting job gcc-9.1-OpenMP-release
  PASSED gcc-7.3.0-OpenMP-release
Testing compiler intel/18.0.5
  Starting job gcc-9.1-Serial-release
  PASSED gcc-9.1-OpenMP-release
Testing compiler clang/8.0
  Starting job intel-18.0.5-OpenMP-release
  PASSED gcc-9.1-Serial-release
  Starting job clang-8.0-Cuda_OpenMP-release
  PASSED intel-18.0.5-OpenMP-release
Testing compiler cuda/10.1
  Starting job clang-8.0-Pthread_Serial-release
  PASSED clang-8.0-Cuda_OpenMP-release
  PASSED clang-8.0-Pthread_Serial-release
  Starting job cuda-10.1-Cuda_OpenMP-release
  PASSED cuda-10.1-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=605 run_time=518
clang-8.0-Pthread_Serial-release build_time=248 run_time=333
cuda-10.1-Cuda_OpenMP-release build_time=646 run_time=481
gcc-7.3.0-OpenMP-release build_time=235 run_time=177
gcc-7.3.0-Pthread-release build_time=215 run_time=176
gcc-9.1-OpenMP-release build_time=261 run_time=162
gcc-9.1-Serial-release build_time=238 run_time=181
intel-18.0.5-OpenMP-release build_time=441 run_time=160
#######################################################

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through the last few files, but I think I said what I wanted to say. I didn't click "Request changes" because I don't want to block the scheduled kokkos-kernels release and promotion. Here are my main concerns about this PR:

  1. It introduces new TPLs for only three functions that already exist in the BLAS and LAPACK TPLs. kokkos-kernels already has code for bringing in BLAS functions, that the author could have imitated.

  2. The code only works for double and complex<double>. For other types, the code may run to completion but produce incorrect results with no diagnostic. kokkos-kernels functions should support at least the four types that the BLAS, LAPACK, cuSPARSE, etc. support, or stop with an error if they do not.

  3. The code reimplements existing KokkosBlas::* functionality.

@@ -24,6 +24,10 @@ KOKKOSKERNELS_INTERNAL_ENABLE_MKL := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE
KOKKOSKERNELS_INTERNAL_ENABLE_CUSPARSE := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "cusparse" | wc -l))
KOKKOSKERNELS_INTERNAL_ENABLE_CUBLAS := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "cublas" | wc -l))
KOKKOSKERNELS_INTERNAL_ENABLE_MAGMA := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "magma" | wc -l))
KOKKOSKERNELS_INTERNAL_ENABLE_CHOLMOD := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "cholmod" | wc -l))
KOKKOSKERNELS_INTERNAL_ENABLE_SUPERLU := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "superlu" | wc -l))
KOKKOSKERNELS_INTERNAL_ENABLE_LAPACKE := $(strip $(shell echo $(KOKKOSKERNELS_ENABLE_TPLS) | grep "lapacke" | wc -l))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndellingwood @iyamazaki just to summarize my concerns about adding CBLAS and LAPACKE TPLs in yesterday's phone call:

Trilinos historically let people add optional TPLs freely. However, as a practical matter, downstream users (like Clark) will not enable them unless they are enabled by default or unless they add a lot of benefit. PR tests do not enable CBLAS and LAPACKE, so you (either you personally, or kokkos-kernels) must do all the downstream testing yourself. Furthermore, you've also added work with no benefit in terms of features, since the CBLAS and LAPACKE features you use already exist in the BLAS and LAPACK.

I see this as a temporary solution, made necessary because of the release time frame. The long-term solution is to wrap the BLAS and LAPACK in kokkos-kernels, not to introduce new TPL dependencies. I don't mind temporary solutions at all and I'll be happy to help with long-term solutions, but please just be aware of the aforementioned concerns. I have no personal stake in this -- I'm just explaining the practical reality that new TPLs create a lot of trouble for downstream users.

@@ -1,5 +1,5 @@
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
LIB_REQUIRED_PACKAGES KokkosCore KokkosContainers KokkosAlgorithms
LIB_OPTIONAL_TPLS quadmath MKL BLAS LAPACK CUSPARSE MAGMA
LIB_OPTIONAL_TPLS quadmath MKL BLAS LAPACK CUSPARSE MAGMA SUPERLU CHOLMOD LAPACKE CBLAS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually test the MAGMA TPL? Just curious -- I'm asking for another project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have a bootstrap issue here for LAPACKE and CBLAS. The tpls don't exist in Trilinos (AFAIK) so adding the TPLs to Dependencies.cmake would also require adding Tribits find modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trilinos does have find modules for SuperLU and Cholmod, but with different capitalization.
SUPERLU -> SuperLU and CHOLMOD -> Cholmod.

This problem of capitalize vs dont' capitalize is going to follow us forever...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjwilke I'm not sure how easy it would be to change these in a backwards-compatible way so that Trilinos could accept either capitalization, even for specific TPLs. @bartlettroscoe might have some ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how easy it would be to change these in a backwards-compatible way so that Trilinos could accept either capitalization, even for specific TPLs.

This is an ugly problem. The names and capitalization of these TPLs should have been considered very carefully from the beginning. But with people breaking backward compatiblity left and right right now (because of the anticipated Trilinos 13.0 release) this is a great time to fix a bunch of these issues.

@jjwilke, can we get a list of TPLs defined in {{Trilinos/TPLsList.cmake}} that should be renamed or recapitalized to match standard CMake names for these TPLs?

I think this is why find_package() can also find <lower_case_name>_config.cmake filkes in addition to <UpperCaseName>Config.cmake files as this solves the capitalization problem, at least for installed packages that provide config.cmake files.

Copy link
Contributor

@mhoemmen mhoemmen Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartlettroscoe wrote:

But with people breaking backward compatiblity left and right right now (because of the anticipated Trilinos 13.0 release) this is a great time to fix a bunch of these issues.

Is there a way to add "aliases" for TPL names? Could I set up a TPL Foo so that users could say TPL_ENABLE_Bar=ON and TPL_ENABLE_Foo=ON gets set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add "aliases" for TPL names? Could I set up a TPL Foo so that users could say TPL_ENABLE_Bar=ON and TPL_ENABLE_Foo=ON gets set?

@mhoemmen, not sure. Would have to look at all of the use cases that would need to work (and it is easy to leave out use cases that don't occur to one right away).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartlettroscoe The main reason would be to support changes in capitalization. On the other hand, CMake already tells you about unused options, so perhaps that's good enough to support a change.

perf_test/sparse/KokkosSparse_sptrsv_aux.hpp Show resolved Hide resolved
template<typename scalar_t>
void forwardP_supernode(int n, int *perm_r, int nrhs, scalar_t *B, int ldb, scalar_t *X, int ldx) {

/* Permute right hand sides to form Pr*B */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost certainly duplicated code. Many operations in kokkos-kernels and downstream code need to permute vectors or multivectors. If kokkos-kernels has an existing permutation function, please use it. Otherwise, consider adding one to kokkos-kernels.

}

template<typename scalar_t>
void backwardP_supernode(int n, int *perm_c, int nrhs, scalar_t *B, int ldb, scalar_t *X, int ldx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment on permutations and duplicated code.

for (int i = 0; i <= j; i++) {
hv(hr(j1 + j) + i) = Lx[psx + i + j*nsrow];
}
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code commented out?

int psx = colptrL[j1];
if (invert_diag) {
if (std::is_same<scalar_t, double>::value == true) {
LAPACKE_dtrtri (LAPACK_COL_MAJOR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of this. You could have just copied and pasted the extern "C" declaration format for BLAS functions that kokkos-kernels already uses, to get calls to LAPACK's _TRTRI functions. It's bad encapsulation to have this "if scalar type is this, call this, else call that, else ..." code pasted directly into the SuperLU code here. You should just make this into a function, and change it to call regular LAPACK inside instead of LAPACKE. You don't have to make it a public function if you don't intend to support it yet.

LAPACKE_ztrtri (LAPACK_COL_MAJOR,
'U', 'N', nscol,
reinterpret_cast <lapack_complex_double*> (&Lx[psx]), nsrow);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about float and complex<float>? kokkos-kernels is supposed to support at least the four types that the BLAS supports. Please at least static_assert or throw here.

if (invert_diag) {
if (offset > 0 && invert_offdiag) {
if (std::is_same<scalar_t, double>::value == true) {
cblas_dtrmm (CblasColMajor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments about LAPACKE apply here to CBLAS.

reinterpret_cast <double*> (&one),
reinterpret_cast <double*> (&hv(nnzD)), nscol+offset,
reinterpret_cast <double*> (&hv(nnzD+nscol)), nscol+offset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about float and complex<float>?

@ndellingwood
Copy link
Contributor

@mhoemmen thanks for thorough review! I skimmed some comments and can provided a couple points of clarification I'm aware of, and let @iyamazaki add other responses.

Magma - we do not have nightly testing running with magma, though the tpls and scripts are setup to handle this on apollo (where development and PR testing was done of the capability). I'll get this setup soon, dropped the ball on it.

Experimental - the code hasn't been used yet outside of development and the API is finalized yet, hope is to get some feedback and review. Longer term, I think the plan is to expose the solver through Amesos2.

@mhoemmen
Copy link
Contributor

@ndellingwood wrote:

Magma - we do not have nightly testing running with magma, though the tpls and scripts are setup to handle this on apollo (where development and PR testing was done of the capability). I'll get this setup soon, dropped the ball on it.

No worries! I can explain in person why MAGMA might be more important for us in the future, but there's no rush.

Conflicts:
	src/sparse/KokkosSparse_sptrsv.hpp
	src/sparse/KokkosSparse_sptrsv_handle.hpp
	src/sparse/impl/KokkosSparse_sptrsv_solve_impl.hpp
	src/sparse/impl/KokkosSparse_sptrsv_solve_spec.hpp
	src/sparse/impl/KokkosSparse_sptrsv_symbolic_impl.hpp
@ndellingwood
Copy link
Contributor

@iyamazaki I put in a PR against this branch on your fork iyamazaki#1 that merges develop back in and resolves conflicts. If it passes testing go ahead and merge it into your branch, then push your updated sptrsv-supernode branch back up to this PR.

@ndellingwood
Copy link
Contributor

Spoke with @srajama1, this PR will go in for the promotion and allow for development; we'll resolve the TPL issues for the next promotion, the goal being to provide proper kokkos-kernels implementations for the missing BLAS and LAPACK routines needed (called from cblas and lapacke currently).
The code is well-guarded and integration testing with the changes in this PR passed.
spot-check on kokkos-dev-2 passed, results posted here during work to resolve merge conflicts.

@ndellingwood ndellingwood removed the WIP Work In Progress label Jan 10, 2020
@ndellingwood
Copy link
Contributor

Merging in for release 3.0
@iyamazaki please create an issue with follow up items, like support for needed blas/lapack routines used via cblas and lapacke, review suggestions from @mhoemmen not yet addressed, etc.

Note the spot-check on kokkos-dev-2 has results posted in a PR to @iyamazaki branch when the merge conflicts were resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants