Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kokkos: view_copy fix #2863

Merged
merged 1 commit into from
Jun 1, 2018
Merged

kokkos: view_copy fix #2863

merged 1 commit into from
Jun 1, 2018

Conversation

ndellingwood
Copy link
Contributor

view_copy fix in Kokkos deep_copy and unit tests added.
Debugging and fix with @crtrott and @dsunder.

Issue exposed by failing Panzer examples reported in #2827.

This PR is a patch created from kokkos/kokkos#1653.

Changes to be committed:
modified: ../src/Kokkos_CopyViews.hpp
modified: CMakeLists.txt
modified: Makefile
new file: TestViewCopy.hpp
new file: cuda/TestCudaHostPinned_ViewCopy.cpp
new file: cuda/TestCudaUVM_ViewCopy.cpp
new file: rocm/TestROCmHostPinned_ViewCopy.cpp

view_copy fix in Kokkos deep_copy and unit tests added.
Debugging and fix with @crtrott and @dsunder.

Issue exposed by failing Panzer examples reported in #2827.

This PR is a patch created from kokkos/kokkos#1653.

 Changes to be committed:
	modified:   ../src/Kokkos_CopyViews.hpp
	modified:   CMakeLists.txt
	modified:   Makefile
	new file:   TestViewCopy.hpp
	new file:   cuda/TestCudaHostPinned_ViewCopy.cpp
	new file:   cuda/TestCudaUVM_ViewCopy.cpp
	new file:   rocm/TestROCmHostPinned_ViewCopy.cpp
@ndellingwood
Copy link
Contributor Author

PR created from a patch of PR kokkos/kokkos#1653.
Kokkos spot-check passed on apollo.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 759
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 476
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 24
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: issue-2827
  • SHA: 7983166
  • Mode: TEST_REPO

Pull Request Author: ndellingwood

@ndellingwood
Copy link
Contributor Author

atdm cuda-dbg build going now, will post results once it finishes.

@ndellingwood
Copy link
Contributor Author

Panzer Example test results - spot check from failures reported in #2827
Ran using the same node allocation setup and ctest -j16 to match the reproducer instructions.

CurlLaplacianExample:

bash-4.2$ ctest -j16
Test project /ascldap/users/ndellin/trilinos/Trilinos/Build/ATDM-cudadbg/packages/panzer/adapters-stk/example/CurlLaplacianExample
    Start 1: PanzerAdaptersSTK_CurlLaplacianExample
    Start 2: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-1
    Start 3: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-2
    Start 4: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3
1/5 Test #1: PanzerAdaptersSTK_CurlLaplacianExample .........................   Passed   14.31 sec
    Start 5: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4
2/5 Test #2: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-1 ...   Passed   28.15 sec
3/5 Test #3: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-2 ...   Passed   37.79 sec
4/5 Test #4: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-3 ...   Passed   63.33 sec
5/5 Test #5: PanzerAdaptersSTK_CurlLaplacianExample-ConvTest-Quad-Order-4 ...   Passed  155.61 sec

100% tests passed, 0 tests failed out of 5

MixedPoissonExample:

bash-4.2$ ctest -j16
Test project /ascldap/users/ndellin/trilinos/Trilinos/Build/ATDM-cudadbg/packages/panzer/adapters-stk/example/MixedPoissonExample
    Start 1: PanzerAdaptersSTK_MixedPoissonExample
    Start 2: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-1
    Start 3: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2
1/3 Test #1: PanzerAdaptersSTK_MixedPoissonExample ........................   Passed   26.48 sec
2/3 Test #2: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-1 ...   Passed   83.60 sec
3/3 Test #3: PanzerAdaptersSTK_MixedPoissonExample-ConvTest-Hex-Order-2 ...   Passed  360.68 sec

100% tests passed, 0 tests failed out of 3

PoissonExample:

bash-4.2$ ctest -j16
Test project /ascldap/users/ndellin/trilinos/Trilinos/Build/ATDM-cudadbg/packages/panzer/adapters-stk/example/PoissonExample
    Start 1: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-1
    Start 2: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-2
    Start 3: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-3
    Start 4: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-4
1/8 Test #2: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-2 ...   Passed   45.27 sec
    Start 5: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-1
2/8 Test #1: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-1 ...   Passed   45.47 sec
    Start 6: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-2
3/8 Test #3: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-3 ...   Passed   70.81 sec
    Start 7: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-3
4/8 Test #4: PanzerAdaptersSTK_PoissonExample-ConvTest-Quad-Order-4 ...   Passed  100.86 sec
    Start 8: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4
5/8 Test #5: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-1 ....   Passed   65.80 sec
6/8 Test #6: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-2 ....   Passed   72.50 sec
7/8 Test #7: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-3 ....   Passed   77.47 sec
8/8 Test #8: PanzerAdaptersSTK_PoissonExample-ConvTest-Tri-Order-4 ....   Passed   77.83 sec

100% tests passed, 0 tests failed out of 8

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 759
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 476
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 24
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3


CDash Test Results for PR# 2863.

@ibaned
Copy link
Contributor

ibaned commented Jun 1, 2018

autotesting failed due to TeuchosComm_UnitTestHarness_Parallel_UnitTests_MPI_no_reduce failing in the Intel 17.0.1 build.
The results are fairly strange to me:
https://testing-vm.sandia.gov/cdash/testDetails.php?test=47689635&build=3565608
I'm not sure if @ndellingwood 's changes caused the issue...
@bartlettroscoe any thoughts on this?

@bartlettroscoe
Copy link
Member

@ibaned, I saw this test TeuchosComm_UnitTestHarness_Parallel_UnitTests_MPI_no_reduce faile while looking at:

I don't understand how these Kokkos changes could impact this test but it is possible since the TeuchosCore subpackage does have an optional dependency on the KokkosCore subpackage.

This is not the only time this test has failed in some builds in all recorded history (about 6 months worth now on testing-vm.sandia.gov/cdash/). If you look at the query:

you can see that this test has failed 4 other times. But all 4 of those other failures look to be configuration or other system issues and therefore do not look to be a code problem.

Note that this is a new Intel auto PR build and it may still have some problems. (See #2864).

Let's run the auto PR tests again and see what happens.

Copy link
Member

@bartlettroscoe bartlettroscoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndellingwood, the code changes look reasonable to me, but I am not sure that I am qualified to review this code. But if this passes the auto PR builds and passes the CUDA tests for the ATDM builds, then I am good with this.

@bartlettroscoe
Copy link
Member

Wait, I know why the test TeuchosComm_UnitTestHarness_Parallel_UnitTests_MPI_no_reduce failed. You can see it in the output at:

which shows:

unning unit tests ...

0. UnitTestHarness_nonRootFails_UnitTest ... Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name ascic158 and rank 1!
[Passed] (0.000121 sec)
...
TEST_0: Pass criteria = Match REGEX {UnitTestHarness_nonRootFails_UnitTest ... .Passed.} [FAILED]

The problem is that the output on rank 0 got jumbled with the startup banner on rank 1. This was just very bad luck.

I will submit a PR to disable the startup banner for this test.

@ibaned
Copy link
Contributor

ibaned commented Jun 1, 2018

I will submit a PR to disable the startup banner for this test.

Thanks @bartlettroscoe !

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this pull request Jun 1, 2018
…inos#2863)

I added --teuchos-suppress-startup-banner to the multi-MPI process tests that
grep the output for pass/fail.  Otherwise, the ouptut on proc 0 can get
jumbled with the statup banner output on other procs.  This happened, for
example, in the PR testing for PR trilinos#2863.  This will stop that from happening
in the future.
@bartlettroscoe bartlettroscoe added the AT: RETEST Causes the PR autotester to run a new round of PR tests on the next iteration label Jun 1, 2018
@bartlettroscoe
Copy link
Member

FYI: PR #2865 will avoid that Teuchos test from tripping up any future auto PR builds. But it was a rare random failure that should not occur again in this PR.

I set the AT: RETEST label to request testing again. Let's get this sucker merged!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - User Requested Retest - Resetting Testing Status

@trilinos-autotester trilinos-autotester removed the AT: RETEST Causes the PR autotester to run a new round of PR tests on the next iteration label Jun 1, 2018
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 763
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 480
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 28
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: issue-2827
  • SHA: 7983166
  • Mode: TEST_REPO

Pull Request Author: ndellingwood

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 763
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 480
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.8.4
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 28
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2863
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH issue-2827
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 7983166
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a69fdc3


CDash Test Results for PR# 2863.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ bartlettroscoe ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - Master Automerge is disabled (in .cfg file)

@bartlettroscoe
Copy link
Member

This PR is ready to merge. Can we pull the trigger and merge this?

@ibaned ibaned merged commit 5f08531 into develop Jun 1, 2018
mhoemmen pushed a commit that referenced this pull request Jun 1, 2018
I added --teuchos-suppress-startup-banner to the multi-MPI process tests that
grep the output for pass/fail.  Otherwise, the ouptut on proc 0 can get
jumbled with the statup banner output on other procs.  This happened, for
example, in the PR testing for PR #2863.  This will stop that from happening
in the future.
@ndellingwood ndellingwood deleted the issue-2827 branch June 4, 2018 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants