Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: CrsGraph: stop trying to read row pointers on host #8303

Closed
wants to merge 1 commit into from

Conversation

tasmith4
Copy link
Contributor

@tasmith4 tasmith4 commented Nov 3, 2020

@trilinos/tpetra @cgcgcg @lucbv

Motivation

CrsGraph was reading the row pointers on host, which leads to segfaults when CudaSpace is the device memory space.

Related Issues

Testing

Tests still pass locally in a UVM build. The particular segfault no longer occurs in a non-UVM build.

@tasmith4 tasmith4 changed the base branch from master to develop November 3, 2020 16:32
@@ -3084,8 +3086,9 @@ namespace Tpetra {
// We have to iterate through the row offsets anyway, so we
// might as well check whether all rows' bounds are the same.
bool allRowsReallySame = false;
auto k_rowPtrs_h = ::Tpetra::Details::getEntriesOnHost(this->k_rowPtrs_, 0, numRows);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kddevin or @brian-kelley -- I'm copying the whole array here because we know we're going to walk it anyway. Are there any guarantees about when k_rowPtrs_ changes that allow us to save some performance by moving this copy out of this function to a place called less frequently?

Copy link
Contributor

@brian-kelley brian-kelley Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, k_rowPtrs_ should never change while a matrix is fill-complete, since it can only change when entries are inserted or deleted from a row. lclGraph.row_map is const-valued to enforce this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is there a problem with making a member view k_rowPtrs_h_ that gets copied over during fillComplete?

What other functions might need to be called when the matrix is not fill-complete? e.g. could we get rid of the getEntryOnHost and getEntriesOnHost calls throughout, or do some functions need to access the row pointers before fill-complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tasmith4 I think that's a good strategy - before fillComplete(), just use k_rowPtrs_h_ so that getRowInfo() is very fast, and then copy it to device k_rowPtrs_ during fillComplete when the local graph/matrix is being constructed. Then, k_rowPtrs_h_ is always up-to-date whether fill complete or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-kelley so you're saying that all the performance-critical device things we need k_rowPtrs_ to live on device for actually happen after fill-complete, so we don't need to be on device before then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tasmith4 Right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up getting runtime errors from k_rowPtrs_ being accessed on device before fillComplete, then you might have to make it a DualView or manually deep copy, but I don't know of anywhere that happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-kelley do you know the difference between fillComplete and expertStaticFillComplete?

Copy link
Contributor

@brian-kelley brian-kelley Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know off the top of my head but:
3675 // Note: We don't need to do the following things which are normally done in fillComplete:
3676 // allocateIndices, globalAssemble, makeColMap, makeIndicesLocal, sortAndMergeAllIndices
(in CrsGraph::expertStaticFillComplete)

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job Trilinos_pullrequest_intel_17.0.1 to start: Total Wait = 603

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 2668
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 325
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 821
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 8193
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 5954
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 1088
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 3874
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: tasmit/row-pointers
  • SHA: d1b516e
  • Mode: TEST_REPO

Pull Request Author: tasmith4

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 2668
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 325
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 821
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 8193
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_cuda_9.2

  • Build Num: 5954
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 1088
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 3874
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS UVM removal;pkg: Tpetra
PULLREQUESTNUM 8303
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH tasmit/row-pointers
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA d1b516e
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 4b7ddc7


CDash Test Results for PR# 8303.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

5 similar comments
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@trilinos-autotester trilinos-autotester added the AT: STALE Added by the PR autotester if too much time has elapsed since the last successful PR test iteration label Nov 10, 2020
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

3 similar comments
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@tasmith4 tasmith4 force-pushed the tasmit/row-pointers branch from 28a0f90 to 0ddb296 Compare November 13, 2020 17:05
@tasmith4 tasmith4 added AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed AT: RETEST Causes the PR autotester to run a new round of PR tests on the next iteration and removed AT: STALE Added by the PR autotester if too much time has elapsed since the last successful PR test iteration labels Nov 13, 2020
Copy link
Contributor

@brian-kelley brian-kelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (correct), future TODO is to store k_rowPtrs_h_ persistently while not fill-complete so that getRowInfo() is fast during assembly.

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

26 similar comments
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@tasmith4
Copy link
Contributor Author

tasmith4 commented Mar 2, 2021

No longer relevant

@tasmith4 tasmith4 closed this Mar 2, 2021
@tasmith4 tasmith4 deleted the tasmit/row-pointers branch March 2, 2021 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: STALE Added by the PR autotester if too much time has elapsed since the last successful PR test iteration pkg: Tpetra UVM removal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tpetra: Invalid access of k_rowPtrs_ inside CrsGraph
3 participants