Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MueLu: kokkos refactor of uncoupled aggregation and amalgamation process #5838

Open
4 of 5 tasks
lucbv opened this issue Sep 2, 2019 · 11 comments
Open
4 of 5 tasks
Assignees
Labels
client: EMPIRE All issues that most directly target the ATDM EMPIRE code client: ExaWind All issue that impact the ECP project ExaWind DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. pkg: MueLu type: enhancement Issue is an enhancement, not a bug

Comments

@lucbv
Copy link
Contributor

lucbv commented Sep 2, 2019

Enhancement

@trilinos/muelu

This issue will track the progress made to integrate all changes from the branch MueLu_uncoupled_aggregation_refactor. The aggregation refactor was done about a year ago and cannot be merged easily anymore so merging the previously mentioned branch will not be attempted.
Instead the changes in that branch will broken into smaller incremental changes with the hope that these incremental changes will be merged quickly and show a more logical progression.
As I tried to merge the old branch I also realized that some changes are potentially poorly designed and I am planning on modifying the work a bit to accommodate better design changes.
Here is a list of new features that are needed to complete the refactor work:

  • MueLu::LWGraph needs to new methods: getRowPtrs() and getEntries() to avoid awkward extraction of data from graph_ see here and here
  • instead of modifying heavily the signature of MueLu:: AggregationAlgorithmBase_kokkos::BuildAggregates(...) I would like to update it to use a Kokkos::View instead of a std::vector for portability reasons and store pointers to additional data such as coloring information in MueLu::Aggregates_kokkos. This will avoid clashes with StructuredAggregation which does not require coloring information and already stores a pointer to a MueLu::IndexManager_kokkos
  • re-write the uncoupled aggregation phases using the work done in the MueLu_uncoupled_aggregation_refactor branch allowing for both non-deterministic and deterministic aggregation options
  • create MueLu::AmalgamationInfo_kokkos and MueLu::AmalgamationFactory_kokkos to set the infrastructure in MueLu for the related algorithms and data structured to be ported to kokkos.
  • actually implement a new version of the amalgamation process that operates on Kokkos::View and Aggregates_kokkos and runs on device.

Note: the current implementation does not allow the user to choose the coloring algorithm used by UncoupledAggregationFactory_kokkos. Currently the algorithm is set to Serial which means that the graph of the matrix is copied back to the CPU memory where the coloring is performed. While this approach provides a deterministic coloring we probably want to allow the user to run on GPU by specifying a different algorithm.

@lucbv lucbv added type: enhancement Issue is an enhancement, not a bug pkg: MueLu client: ExaWind All issue that impact the ECP project ExaWind labels Sep 2, 2019
@lucbv lucbv self-assigned this Sep 2, 2019
@lucbv
Copy link
Contributor Author

lucbv commented Sep 2, 2019

Tagging a few people who will be interested in this process: @jhux2 @csiefer2 @aprokop @cgcgcg
I will try to update this issue regularly as PR related to it pop up!

@lucbv
Copy link
Contributor Author

lucbv commented Sep 4, 2019

PR #5841 implements the new interface in LWGraph, first step done.
I am now working on some changes in MueLu::Aggregates_kokkos to have the object store the colors computed by the distance 2 coloring algorithm and then propagate them to the various phases of aggregation.

lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 4, 2019
…1 aggregation issue trilinos#5838

Adding a set and get methods to Aggregates_kokkos in order to store the coloring of the LWGraph used for aggregates creation.
This will allow to pass the colors to all phases of aggregation without reworking interfaces too much.
Using the new methods in LWGraph_kokkos avoids copying on host the data needed by
the coloring algorithm. This saves 2 communication: 1 DtoH and 1 HtoD.
Extra time monitors are added in AggregationPhase1 to account for coloring and
aggregation separately.
Adding a test that exercises the Distance2 algorithmic path in test/scaling
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 5, 2019
…s:develop' (b6b763a).

* trilinos-develop:
  MueLu: interfacing distance2 coloring work with Aggregates and Phase 1 aggregation issue trilinos#5838
  MueLu: adding unit-test for LWGraph_kokkos
  MueLu: extending the interface of LWGraph to give direct access to rowPtrs and entries
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 5, 2019
…s:develop' (b6b763a).

* trilinos-develop:
  MueLu: interfacing distance2 coloring work with Aggregates and Phase 1 aggregation issue trilinos#5838
  MueLu: adding unit-test for LWGraph_kokkos
  MueLu: extending the interface of LWGraph to give direct access to rowPtrs and entries
@lucbv
Copy link
Contributor Author

lucbv commented Sep 5, 2019

Using the work from #5841 the data from the LWGraph is now directly passed to the distance 2 coloring algorithm.
PR #5853 took care of storing the graph colors so that it is accessible to all aggregation steps.
Next step will be to modify AggregationAlgorithmBase_kokkos::BuildAggregates(...) so that it uses a Kokkos::View and not a std::vector, I'm adding that bullet to the work outlined above. This will require rewriting the interface of all aggregation algorithms in MueLu...

@lucbv
Copy link
Contributor Author

lucbv commented Sep 5, 2019

With PR #5860 merged the framework upgrades for on device aggregation are completed, the next PR will tackle the algorithmic upgrade.
Most of will simply be moving the changes that were made last year to the current code.
Hopefully this can be completed tomorrow and tested on Monday!

lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 6, 2019
Two new aggregation algorithms are implemented:
    1) is a direct refactor of the serial algorithm using kokkos
    2) is a similar approach to 1) but provides deterministic aggregates
lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 10, 2019
Two new aggregation algorithms are implemented:
    1) is a direct refactor of the serial algorithm using kokkos
    2) is a similar approach to 1) but provides deterministic aggregates
trilinos-autotester added a commit that referenced this issue Sep 11, 2019
…gregation

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: MueLu: refactore of aggregation phase 1, see issue #5838
PR Author: lucbv
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 12, 2019
…s:develop' (98aab1e).

* trilinos-develop:
  MueLu: fix -Wcatch-value warnings (trilinos#5891)
  Automatic snapshot commit from tribits at ae74743
  Xpetra: ETI - BlockedMap - 2019-09-10 - bugfix
  MueLu: refactore of aggregation phase 1, see issue trilinos#5838
  Xpetra: ETI - StridedMap
  MueLu: Nullspace fix for distributed coarse matrix
  Tpetra: Enabling L,G,N ETI for FECrsGraph (trilinos#5886)
  Tempus: Missing Args in explicit Steppers InArgs.
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 12, 2019
…s:develop' (98aab1e).

* trilinos-develop:
  MueLu: fix -Wcatch-value warnings (trilinos#5891)
  Automatic snapshot commit from tribits at ae74743
  Xpetra: ETI - BlockedMap - 2019-09-10 - bugfix
  MueLu: refactore of aggregation phase 1, see issue trilinos#5838
  Xpetra: ETI - StridedMap
  MueLu: Nullspace fix for distributed coarse matrix
  Tpetra: Enabling L,G,N ETI for FECrsGraph (trilinos#5886)
  Tempus: Missing Args in explicit Steppers InArgs.
lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 12, 2019
…#5838

As for phase 1, this adds two code path, one for deterministic and one for
 random aggregation a.k.a. non deterministic.
@brian-kelley
Copy link
Contributor

@lucbv Rather than just push directly to your branch, I made a PR to give you a chance to look at what I changed.

  • Compute the aggWeights all at once in one parallel_for and then fence, so those values are not being read and updated in the same parallel section
  • Compute aggPenalty updates in one View, then adding them into aggPenalties all at once (with a fence in between). This should fix the other data race.
  • Change the allocated size of aggPenalties to be numAggs, not numRows (I did this in both the random and deterministic versions)

@jhux2
Copy link
Member

jhux2 commented Sep 18, 2019

Tagging Nalu-Wind stakeholders @alanw0, @sayerhs, and @sthomas61.

@jhux2
Copy link
Member

jhux2 commented Sep 18, 2019

Tagging EMPIRE stakeholders @egphill and @bathmatt.

@lucbv lucbv added the client: EMPIRE All issues that most directly target the ATDM EMPIRE code label Sep 18, 2019
lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 18, 2019
…#5838

As for phase 1, this adds two code path, one for deterministic and one for
 random aggregation a.k.a. non deterministic.
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 23, 2019
…s:develop' (2c8ff69).

* trilinos-develop:
  MueLu: rebasing gold files
  MueLu: adding refactor for phase 2b of aggregation
  MueLu: kokkos refactor of phase 2a of aggregation, see issue trilinos#5838
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Sep 23, 2019
…s:develop' (2c8ff69).

* trilinos-develop:
  MueLu: rebasing gold files
  MueLu: adding refactor for phase 2b of aggregation
  MueLu: kokkos refactor of phase 2a of aggregation, see issue trilinos#5838
lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 27, 2019
Phase 3 seems to work correctly, still need to test it on GPU and to rebase the gold files.
lucbv added a commit to lucbv/Trilinos that referenced this issue Oct 10, 2019
Phase 3 seems to work correctly, still need to test it on GPU and to rebase the gold files.
lucbv added a commit that referenced this issue Oct 11, 2019
MueLu: refactor of Phase3 aggregation, see issue #5838
@lucbv
Copy link
Contributor Author

lucbv commented Oct 11, 2019

PR #5995 provides a kokkos refactor for the last phase of aggregation. Now only the amalgamation process needs to be refactored, this should technically be easy except for the switch from using Teuchos::Array to Kokkos::View to store the rowTranslation and colTranslation... this could take a bit of coding to re-interface with the rest of the code.

lucbv added a commit to lucbv/Trilinos that referenced this issue Oct 11, 2019
…see issue trilinos#5838

Starting the proper kokkos refactor of the amalgamation process.
It is not as easy as one might hope since the Map class is not that Kokkos friendly.
The locaMap class is helping a bit tough.
At the moment the interfaces for amalgamation methods have been modified to remove
 Teuchos::Array in favor of Kokkos::View, a few typedefs are added for convenience.
The algorithms in AmalgamationInfo_kokkos are almost all refactored except for the
 method: ComputeUnamalgamatedImportDofMap which needs some more work.

Next step: tackle the AmalgamationFactory_kokkos...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Oct 12, 2019
…s:develop' (4ec5f0b).

* trilinos-develop:
  Change Spack module from SuperLUDist 6.2.0 to 5.4.0 (CDOFA-66)
  Testing: Tpetra: make nightly build faster
  Update CrsMatrix_UnitTests4.cpp
  Tpetra: Fixes so that trilinos#6076 can pass
  MueLu: rebasing the gold files for phase 3 refactored outputs
  MueLu: refactor of Phase3 aggregation, see issue trilinos#5838
  KokkosKernels - trsm transpose does not solve the problem correctly.
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Oct 12, 2019
…s:develop' (4ec5f0b).

* trilinos-develop:
  Change Spack module from SuperLUDist 6.2.0 to 5.4.0 (CDOFA-66)
  Testing: Tpetra: make nightly build faster
  Update CrsMatrix_UnitTests4.cpp
  Tpetra: Fixes so that trilinos#6076 can pass
  MueLu: rebasing the gold files for phase 3 refactored outputs
  MueLu: refactor of Phase3 aggregation, see issue trilinos#5838
  KokkosKernels - trsm transpose does not solve the problem correctly.
searhein pushed a commit to searhein/Trilinos that referenced this issue Oct 15, 2019
…artition-of-unity

* 'develop' of https://github.com/searhein/Trilinos: (100 commits)
  Change initializer list order to make g++ happy
  Plumbing for BelosBlockCGSolMgr so that Assert Positive Definiteness flag gets passed to BelosCGIter
  Change Spack module from SuperLUDist 6.2.0 to 5.4.0 (CDOFA-66)
  Testing: Tpetra: make nightly build faster
  Update CrsMatrix_UnitTests4.cpp
  Tpetra: Fixes so that trilinos#6076 can pass
  MueLu: rebasing the gold files for phase 3 refactored outputs
  MueLu: refactor of Phase3 aggregation, see issue trilinos#5838
  KokkosKernels - trsm transpose does not solve the problem correctly.
  Belos: treats cases where numResTests is zero
  Belos: removes a int cast warning and removes one unnecessary StatusTestCombo cast
  Belos: removes a int cast warning and removes one unnecessary StatusTestCombo cast
  Reduced from ctest -j8 to -j4 for all CUDA builds (trilinos#6052)
  Belos: solves Issue trilinos#6059
  Belos: solves Issue trilinos#6059
  ML: Code cleanup to RefMaxwell
  ML: Removing supperfluous import constructor
  ML: Adding new support routine
  Xpetra: disable Map cloner test if no Tpetra deprecated
  Phalanx - wrong number of kokkos allocation arguments.
  ...
@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Sep 25, 2021
@github-actions
Copy link

github-actions bot commented Nov 3, 2021

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Nov 3, 2021
@github-actions github-actions bot closed this as completed Nov 3, 2021
@jhux2 jhux2 added DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. and removed MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. labels Nov 3, 2021
@jhux2
Copy link
Member

jhux2 commented Nov 3, 2021

Reopening until verified as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: EMPIRE All issues that most directly target the ATDM EMPIRE code client: ExaWind All issue that impact the ECP project ExaWind DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. pkg: MueLu type: enhancement Issue is an enhancement, not a bug
Projects
Status: Backlog
Development

No branches or pull requests

3 participants