Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KokkosBlas: limited vector size in norm calculation? #9856

Closed
kddevin opened this issue Oct 25, 2021 · 4 comments
Closed

KokkosBlas: limited vector size in norm calculation? #9856

kddevin opened this issue Oct 25, 2021 · 4 comments
Assignees
Labels
pkg: KokkosKernels pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests UVM removal

Comments

@kddevin
Copy link
Contributor

kddevin commented Oct 25, 2021

Bug Report

@trilinos/tpetra

Description

@jennloe reports

I have this Trilinos (Belos + Tpetra) code which is crashing, and I don't understand why

76   typedef double                            ScalarType;
77   typedef int                               OT;
78   typedef Tpetra::MultiVector<ScalarType>   MV;
79   typedef Tpetra::MultiVector<ScalarType>::mag_type   MT;
...
132   RCP<Tpetra::MultiVector<ScalarType>> X1 = rcp( new Tpetra::MultiVector<ScalarType>(map, 1008) );
133   X1->randomize();
134   RCP<Tpetra::MultiVector<ScalarType>> XCopy = rcp(new Tpetra::MultiVector<ScalarType>(*X1, Teuchos::Copy)); //Deep copy of X1.
....
154   // DEBUG: //Verify coefficients:
155   MV coeffs_mv = makeStaticLocalMultiVector (*X1, 1008, 1008);
156   std::cout << "Starting deep copy" << std::endl;
157   Tpetra::deep_copy(coeffs_mv, *coeffMat);
158   std::cout << "Finished deep copy" << std::endl;
159   XCopy->multiply(Teuchos::NO_TRANS, Teuchos::NO_TRANS, 1.0, *X1, coeffs_mv, -1.0);
160   std::cout << "Passed multiply." << std::endl;
161   std::vector<MT> norms(1008);
162   Teuchos::ArrayView<MT> normView(norms);
163   std::cout << "created norms array" << std::endl;
164   XCopy->norm2(normView);
165   std::cout << "Here are the QR-Orig norms.  Should be zero." << std::endl;
166   std::cout << normView << std::endl << std::endl;

And the output is:

$ ./Belos_TpetraOrthogCoeffs.exe
Starting deep copy
Finished deep copy
Passed multiply.
created norms array
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaStreamSynchronize(m_stream) error( cudaErrorIllegalAddress): an illegal memory access was encountered /ascldap/sers/jloe/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:314
Traceback functionality not available
Aborted (core dumped)

At first glance, the problem appears to be a mismatch between the norms argument (on host) and the vector argument (on device)

Steps to Reproduce

Reproduced with simple test program on ascicgpu with no UVM.

@kddevin kddevin added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra UVM removal labels Oct 25, 2021
@kddevin kddevin self-assigned this Oct 25, 2021
@kddevin
Copy link
Contributor Author

kddevin commented Oct 25, 2021

I was too quick to say it was a Tpetra issue. The problem seems to be in KokkosBlas.

I wrote the following reproducer:

  const size_t nGlobalEntries = 100;
  const size_t nVecs = 181;

  Kokkos::View<Scalar **, Kokkos::LayoutLeft, typename Node::device_type> mv("mv", nGlobalEntries, nVecs);
  Kokkos::deep_copy(mv, 3.14);

  std::vector<Scalar> norm(nVecs);
  Kokkos::View<Scalar*, Kokkos::HostSpace> normView(&norm[0], nVecs);
  try {
    KokkosBlas::nrm2_squared(normView, mv);
  }
  catch (std::exception &e) {
    std::cout << "Test failed for " << nVecs << " vectors:  " << e.what() << std::endl;
    TEST_ASSERT(false);
  }

With Node::device_type == Kokkos::Cuda, this code fails with the error that @jennloe saw.
If I change only nVecs from 181 to 180, this code passes.
@trilinos/kokkos-kernels

kddevin added a commit that referenced this issue Oct 25, 2021
@kddevin kddevin changed the title Tpetra: without UVM, invalid memory space in norms KokkosBlas: limited vector size in norm calculation? Oct 25, 2021
@kddevin
Copy link
Contributor Author

kddevin commented Oct 26, 2021

Brian says this is a bug with Kokkos' reduce; see kokkos/kokkos#4461

@brian-kelley
Copy link
Contributor

brian-kelley commented Nov 30, 2021

@kddevin @jennloe FYI kokkos/kokkos-kernels#1204 will take care of this. You can have any number of vectors on any backend, and for most of the BLAS-1 functions (dot, norms other than nrminf, sum) you'll get a ~3x speedup on Cuda compared to before.

@jhux2
Copy link
Member

jhux2 commented Nov 30, 2021

@iyamazaki

trilinos-autotester added a commit that referenced this issue Jan 13, 2022
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Patch in Kokkos Kernels #1204 to fix #9856
PR Author: brian-kelley
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jan 14, 2022
…s:develop' (7f1c956).

* trilinos-develop:
  Nightlies on Geminga: Fix build names
  Tempus:  Work around Piro adjoint sensitivity test failure
  Tempus:  Fix mistake in merge
  Anasazi: Add another epsilon compare that I missed
  Allow epsilon tolerance in MultiVecTraits norms
  Patch in Kokkos Kernels trilinos#1204 to fix trilinos#9856
  Tempus:  Reset computation of dg/dx, df/dx, df/dx_dot every integration
  Tempus:  Add method to get response value from pseudo-transient adjoint integrator
  Tempus:  Allow 2nd adjoint ME for pseudo-transient adjoint integrator
  Tempus:  Add more needed methods to adjoint, psuedo-transient adjoint integrators
  Tempus:  Add new sensitivity functions to adjoint, pseudo-transient adjoint integrators
  Tempus:  Changes to sensitivity integrators in support of SPARC
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jan 14, 2022
…s:develop' (7f1c956).

* trilinos-develop:
  Nightlies on Geminga: Fix build names
  Tempus:  Work around Piro adjoint sensitivity test failure
  Tempus:  Fix mistake in merge
  Anasazi: Add another epsilon compare that I missed
  Allow epsilon tolerance in MultiVecTraits norms
  Patch in Kokkos Kernels trilinos#1204 to fix trilinos#9856
  Tempus:  Reset computation of dg/dx, df/dx, df/dx_dot every integration
  Tempus:  Add method to get response value from pseudo-transient adjoint integrator
  Tempus:  Allow 2nd adjoint ME for pseudo-transient adjoint integrator
  Tempus:  Add more needed methods to adjoint, psuedo-transient adjoint integrators
  Tempus:  Add new sensitivity functions to adjoint, pseudo-transient adjoint integrators
  Tempus:  Changes to sensitivity integrators in support of SPARC
trilinos-autotester added a commit that referenced this issue Feb 8, 2022
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Tpetra:  reproducer for #9856
PR Author: kddevin
cgcgcg pushed a commit to cgcgcg/Trilinos that referenced this issue Jun 7, 2022
…Trilinos:develop' (615b98f).

* trilinos/develop: (25 commits)
  Drivers: Changes for sems update
  Drivers: Changes for sems update
  Drivers: lightsaber update
  MueLu: Add parameter translation for ML Maxwell1
  Tpetra: remove unused device_type typedef in functor
  Geminga: Set CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS:BOOL=OFF for Cuda builds
  Geminga: Fix cron driver
  ShyLU_DD/FROSch: fix counting nnz
  Ifpack2 FastILU: Fix MV access
  ShyLU_DD/FROSch: fix data access without UVM
  Nightlies on Geminga: Fix build names
  ShyLU_DDD/FROSch: add access patter for building without Tpetra deprecated code
  Attempt to fix nightlies on geminga
  Tempus:  Work around Piro adjoint sensitivity test failure
  Finish transition from sems-xyz to sems-archive-xyz (SEHELPD-2963)
  Tempus:  Fix mistake in merge
  Anasazi: Add another epsilon compare that I missed
  Allow epsilon tolerance in MultiVecTraits norms
  Patch in Kokkos Kernels trilinos#1204 to fix trilinos#9856
  Tempus:  Reset computation of dg/dx, df/dx, df/dx_dot every integration
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: KokkosKernels pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests UVM removal
Projects
None yet
Development

No branches or pull requests

3 participants