Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 8.0 test failures on ride #301

Closed
jewatkins opened this issue May 29, 2018 · 2 comments
Closed

CUDA 8.0 test failures on ride #301

jewatkins opened this issue May 29, 2018 · 2 comments
Labels
Testing Stuff related to testing Albany (including nightly tests)

Comments

@jewatkins
Copy link
Collaborator

The following cuda tests started failing on May 25th, 2018 (see http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=71409):

  • QCAD_Schrodinger_parabolic3D_Tpetra
  • QCAD_Schrodinger_finiteWall1D_Tpetra
  • QCAD_Schrodinger_infiniteWall2D_Tpetra
  • FO_AIS_16km_MueLu
  • Aeras_HydrostaticBaroclinicInstabilitiesUnperturbed_hv
  • Aeras_HydrostaticBaroclinicInstabilitiesPerturbed_hv
  • Aeras_HydrostaticPureAdvection_1_HV

This build pulls the develop branch of Trilinos so I suspect this might have to do with recent updates to Kokkos. The QCAD tests throw the following error:

p=1: *** Caught standard std::exception of type 'std::invalid_argument' :
/home/projects/albany/repos/Trilinos/packages/tpetra/core/src/Tpetra_MultiVector_def.hpp:581:
Throw number = 21
Throw test that evaluated to true: (view.extent (1) != 0 && static_cast<size_t> (view.extent (1)) <= maxColInd)
Tpetra::MultiVector<double,int,long long,Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaUVMSpace>>::MultiVector(map,view,origView,whichVectors): view.extent(1) = 124 <= max(whichVectors) = 18446744073694571435.

while the Aeras tests seems to be running out of memory:

p=0: *** Caught standard std::exception of type 'std::runtime_error' :
cudaMemcpy( dst , src , n , cudaMemcpyDefault ) error( cudaErrorMemoryAllocation): out of memory /home/projects/albany/repos/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:92
Traceback functionality not available

The FELIX test is timing out but this test has always been fairly slow. In general, there seems to be a 2x slow down across a number of tests which may be why this test timed out (see the time history for TransientHeat1D_Tpetra here: http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=3703868&build=71526).

@ibaned Do you have any ideas as to why some of these tests might be failing?

@jewatkins jewatkins added the Testing Stuff related to testing Albany (including nightly tests) label May 29, 2018
@ibaned
Copy link
Contributor

ibaned commented May 29, 2018

No idea on the details of why, but there is already a report of other failures in Trilinos, trilinos/Trilinos#2827

jewatkins added a commit that referenced this issue May 30, 2018
KOKKOS_HAVE was deprecated in the latest KOKKOS update. This
should clean up the cuda tests.
@jewatkins
Copy link
Collaborator Author

jewatkins commented May 31, 2018

The previous commit fixed this issue.

ibaned pushed a commit that referenced this issue Jun 12, 2018
KOKKOS_HAVE was deprecated in the latest KOKKOS update. This
should clean up the cuda tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing Stuff related to testing Albany (including nightly tests)
Projects
None yet
Development

No branches or pull requests

2 participants