Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: track MultiVector DualView refactoring #8591

Closed
17 of 18 tasks
brian-kelley opened this issue Jan 16, 2021 · 20 comments
Closed
17 of 18 tasks

Tpetra: track MultiVector DualView refactoring #8591

brian-kelley opened this issue Jan 16, 2021 · 20 comments
Assignees
Labels
AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) pkg: Tpetra UVM removal

Comments

@brian-kelley
Copy link
Contributor

brian-kelley commented Jan 16, 2021

Work Tracking Issue

@trilinos/tpetra

Branch for working on MultiVector DualView management refactor: TpetraDualViewRefactor.

Updated 2-3-21:
I replaced the new functions we just added (e.g. getLocalViewDeviceConst, getLocalViewHostNonConst) with permament tagged versions:

//New interface:
#include "Tpetra_Access.hpp"
//...
MultiVector::getLocalView<Device>(Tpetra::Access::ReadOnly());
MultiVector::getLocalView<Device>(Tpetra::Access::ReadWrite());
MultiVector::getLocalView<Device>(Tpetra::Access::WriteOnly());

Non-templated versions like getLocalViewHost(tag) too.
All getLocalView calls in MultiVector and core/test/MultiVector have been replaced with this new interface, but this caused new failures in tests.

MultiVector

  • TpetraCore_MultiVector_UnitTests_MPI_4
  • TpetraCore_Transform_MPI_4
  • TpetraCore_Bug7758_MPI_4
  • TpetraCore_Bug7745_MPI_4
  • TpetraCore_MultiVector_get2dView_MPI_1
  • TpetraCore_MultiVector_TwoArgRandomize_MPI_1
  • TpetraCore_Issue364_MPI_1
  • TpetraCore_MultiVector_MicroBenchmark_MPI_1 //works on 4 ranks, doesn't work on 1
  • TpetraCore_deep_copy_SerialDenseMatrix_to_MultiVector_MPI_1
  • TpetraCore_deep_copy_MultiVector_to_SerialDenseMatrix_MPI_1
  • TpetraCore_MV_reduce_strided_MPI_4 //Brian
  • TpetraCore_aliased_deep_copy_MPI_4

FEMultiVector

  • TpetraCore_FEMultiVector_UnitTests_MPI_4. // Can't fix until CrsMatrix is fixed
  • TpetraCore_FEMultiVector_Fix3101_MPI_4

BlockMultiVector

  • BlockMultiVector_MPI_4
  • BlockMultiVector2_MPI_4
  • BlockMap_MPI_4.

Import/Export

  • Issue3968
@brian-kelley brian-kelley added pkg: Tpetra AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) labels Jan 16, 2021
@kyungjoo-kim
Copy link
Contributor

@brian-kelley would you put your branch name here so that I can fork from yours ?

@kddevin
Copy link
Contributor

kddevin commented Jan 19, 2021

Branch for working on MultiVector DualView management refactor: TpetraDualViewRefactor.

@kyungjoo-kim
Copy link
Contributor

Oh.... the branch is from Trilinos repo. I looked for it from brian's forked repo. Now I got it.

@kyungjoo-kim
Copy link
Contributor

kyungjoo-kim commented Jan 21, 2021

in order not to duplicate effort,

  • item 14,17, 20, 23, 28, 29, 35 are done.

Item 30.

This test a workflow 1) a user creates a device view, 2) tpetra wraps the device view in the dual view structure, 3) check data consistency when it modifies the user created device local view.

This workflow is the thing that we do not want users to do. When a user creates a device view and hand it to Tpetra, it breaks the Tpetra dual view logic as the user hold the local object (reference count is unmatched when Tpetra is created). If we want to give this option to users, we need to allocate the dual view and make deep copy and users local device view lose the link to Tpetra. Soft copy of device(or host) view to dual view is not working if we want to keep the transactional behavior on the tpetra dual view context.

Tpetra team reaches work-around for the item 30.

Leave the interface for wrapping a device view interface (soft copy to tpetra dual view device view or host view) which allows reference counting does not match when tpetra object is constructed. However, the testing scenario described in item 30 is the anti-pattern that we want to prevent it from users. So, we just do not test the part of the anti-pattern test.

@kyungjoo-kim
Copy link
Contributor

Issue 364 is done.

kddevin added a commit that referenced this issue Jan 25, 2021
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley
@csiefer2
Copy link
Member

csiefer2 commented Jan 28, 2021

OK. So I removed every reference to getLocalViewHost() and getLocalViewDevice() in Tpetra and the tests and replaced them with the appropriate const and non-const version.

Complications:

  1. Tpetra::MultiVector::description() should not do any sync'ing. To do this I added an "unsafe" function with the old sync-less behavior. There are other ways to do this that might be better We should discuss this tomorrow.

@kddevin
Copy link
Contributor

kddevin commented Jan 29, 2021

List of things to do before merging:

  • switch to use access tags (ReadOnly, WriteOnly, ReadWrite) // Karen
  • remove sync/modify from all MV usage //Kyungjoo
  • make sure MV tests work without UVM // Brian: they do as of 2-19-21, so now make sure they don't break again
  • make sure all tests work with UVM (a la PR testing)
  • disable old interface if and only if CUDA is ON and UVM is OFF. //Timothy

Eventually, but not necessarily for PR merge

  • formally deprecate old functions (might do when CrsGraph/Matrix deprecations are ready)

@kddevin
Copy link
Contributor

kddevin commented Jan 29, 2021

Thoughts on ReadOnly/WriteOnly/ReadWrite access:

1/29/21

namespace Access {
  struct ReadOnly  {};
  struct WriteOnly {};
  struct ReadWrite {};
}

getLocalView<device>(Tpetra::Access::ReadOnly());
getLocalViewHost(Tpetra::Access::WriteOnly());
getLocalViewDevice(Tpetra::Access::ReadWrite());

getLocalViewHost(const Tpetra::Access::WriteOnly &ignoreme) {
  don't bother syncing
  clear the sync flags
  mark host modify
  return host pointer
}

const getLocalViewHost(const Tpetra::Access::ReadOnly &ignoreme) const {
  sync if needed
  return host pointer
}

getLocalViewHost(const Tpetra::Access::ReadWrite &ignoreme) {
  sync if needed
  mark host modify
  return host pointer
}


getLocalView<device>(Access::ReadOnly());
getLocalViewHost(Access::WriteOnly());
getLocalViewDevice(Access::ReadWrite());

@kddevin will change existing Access struct in withLocalAccess
@brian-kelley will put the functions and tags n place
@trilinos/tpetra will update branch to use the new functions with tags

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jan 30, 2021
…s:develop' (dde88a7).

* trilinos-develop: (24 commits)
  MueLu: Gold Files
  Phalanx: add queries to DAG Manager
  MueLu: More BlockDiagonal mods
  MueLu: Adding first cut at a block-diagonal coarsening
  MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings
  Piro: close trilinos#8641
  Fix latest fix -- discarded const
  Turn Anasazi on for EMPIRE
  Fix compilation error on intel-17
  Automatic snapshot commit from seacas at 729e904fa5
  MueLu: RegionMG cleaning up test example dirver
  MueLu: fix typos in documentation
  MueLu CoalesceDrop: Fix FPE in cut-based distance Laplacian
  MueLu Hierarchy: Skip forced deletion of level vectors
  MueLu ParameterListInterpreter: Switch processing of coarse solve and smoothers
  MueLu: Add unit test for StratimikosSmoother
  tpetra:  removed use of device_type; the test runs on host. trilinos#8591
  MueLu: rename input to InterfaceAggFact
  MueLu: fix SchurComplement for parallel runs
  MueLu: add test for dof-based InterfaceAggFact
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jan 30, 2021
…s:develop' (dde88a7).

* trilinos-develop: (24 commits)
  MueLu: Gold Files
  Phalanx: add queries to DAG Manager
  MueLu: More BlockDiagonal mods
  MueLu: Adding first cut at a block-diagonal coarsening
  MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings
  Piro: close trilinos#8641
  Fix latest fix -- discarded const
  Turn Anasazi on for EMPIRE
  Fix compilation error on intel-17
  Automatic snapshot commit from seacas at 729e904fa5
  MueLu: RegionMG cleaning up test example dirver
  MueLu: fix typos in documentation
  MueLu CoalesceDrop: Fix FPE in cut-based distance Laplacian
  MueLu Hierarchy: Skip forced deletion of level vectors
  MueLu ParameterListInterpreter: Switch processing of coarse solve and smoothers
  MueLu: Add unit test for StratimikosSmoother
  tpetra:  removed use of device_type; the test runs on host. trilinos#8591
  MueLu: rename input to InterfaceAggFact
  MueLu: fix SchurComplement for parallel runs
  MueLu: add test for dof-based InterfaceAggFact
  ...
kddevin added a commit that referenced this issue Jan 30, 2021
seamill pushed a commit to seamill/Trilinos that referenced this issue Feb 3, 2021
…develop' (1015f63).

* potential-trilinos-develop: (37 commits)
  Framework: add LDFLAGS to config.ini file for Intel 17 build
  MueLu: Gold Files
  Framework: intel 19 build with CMake 3.17.x or greater
  Framework: intel 17 build with CMake 3.17.x
  Framework: testing - disable cmake_cxx_flags for intel 17 build
  Framework: testing cmake updates and intel compiler error
  Framework: testing cmake updates and intel compiler error
  Framework: testing cmake updates and intel compiler error
  MueLu: More BlockDiagonal mods
  MueLu: Adding first cut at a block-diagonal coarsening
  MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings
  PANZER: speed up CubeHexMeshFactory by populating edge blocks using a vector of edges PANZER: speed up CubeHexMeshFactory by populating face blocks using a vector of faces PANZER: speed up CubeHexMeshFactory by reducing the number of calls to beginModification()/endModification() PANZER: add a disable_subcells test to check the "Build Subcells: false" case
  Piro: close trilinos#8641
  Fix latest fix -- discarded const
  Turn Anasazi on for EMPIRE
  Fix compilation error on intel-17
  Automatic snapshot commit from seacas at 729e904fa5
  MueLu: RegionMG cleaning up test example dirver
  MueLu: fix typos in documentation
  tpetra:  removed use of device_type; the test runs on host. trilinos#8591
  ...
@kddevin
Copy link
Contributor

kddevin commented Feb 5, 2021

@brian-kelley @csiefer2 I got all tests except the mv-reduce-strided test passing today.
The templated getLocalView<>(access) was not working correctly; it was always following the host view path. So it would return the correct view but with the flags set and sync done incorrectly. I changed the test for which path to take. It isn't an elegant test, though, so you probably should review it. 13e3a19
I also changed the implementation of getLocalViewHost and getLocalViewDevice to NOT call the templated version. From a design standpoint, I like avoiding the if tests in the templated function when the user has explicitly told us which view he wants. But I don't like having the logic duplicated in the templated and non-templated functions. I wish we could just specialize the templated function.
Let's discuss on Tuesday.

@kyungjoo-kim
Copy link
Contributor

I removed the sync, modify things on the multivector but I leave the interface there. This interface can be removed later after other items are fixed ie., local access interface is eliminated.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented Feb 25, 2021

@kyungjoo-kim @kddevin I just checked in the removal of withLocalAccess, for_each and transform. The one issue I still see in Ifpack2 is that RBILUK is still trying to call removed BlockMultiVector functions like sync_host. But the things that used withLocalAccess (Chebyshev, Relaxation) built and passed before I rebased on that commit. (I mean with UVM)

@brian-kelley
Copy link
Contributor Author

I got rid of all the deprecated MV/BlockMV calls in Ifpack2, they're all using the new interface.

There were 2 tests in MultiVector_LocalViewTests that failed with CudaUVMSpace: HostDeviceView and DeviceHostView. I think the issue is that they were expecting { h = mv.getLocalViewHost(); d = mv.getLocalViewDevice(); } to always throw, but it only throws if host and device are different spaces (if they're not, DualView's host and device views are the same, so reference counts are the same). I fixed one by doing this instead:

  bool shouldThrow = ! Kokkos::SpaceAccessibility<Kokkos::Serial, typename Node::memory_space>::accessible;

and that fixed HostDeviceView. DeviceHostView I commented out but the same fix should work there.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented Mar 10, 2021

Ifpack2 failures with UVM, without launch blocking:

  • Ifpack2_RILUK_hb_belos_MPI_2
  • Ifpack2_RILUK_hb_belos_MPI_4
  • Ifpack2_unit_tests_MPI_4
  • Ifpack2_AdditiveSchwarz_MPI_4
  • Ifpack2_RBILUK_MPI_4
  • Ifpack2_RILUKSingleProcessUnitTests_MPI_1
  • Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4

@brian-kelley
Copy link
Contributor Author

@kddevin With the last commit, all tests in Tpetra, Belos, Ifpack2, MueLu, and Teko pass with UVM and without launch blocking.

@kyungjoo-kim
Copy link
Contributor

kyungjoo-kim commented Mar 12, 2021

@brian-kelley Does the last commit mean that the fencing execution space in sync_host and sync_device ? If so, that does not make sense. When I and @crtrott talked, the execution fence in sync function is redundant as Kokkos::deep_copy already puts Kokkos::fence() when it has the same pointers. here To me, it is a mystery that doulbe fencing actually solves a problem when launch blocking is off.

@crtrott any idea ?

@jhux2
Copy link
Member

jhux2 commented Mar 12, 2021

Did someone say "double fencing"? @csiefer2 might have something to say about that.

@brian-kelley
Copy link
Contributor Author

@jhux2 Luckily, this is a 0 vs. 1 fences being different, not a 1 vs. 2 fences.

@kyungjoo-kim You were right, even though the fencing was not consistent before, just adding them to sync_host and sync_device wasn't the correct solution. Have a look at kokkos/kokkos#3850 . I think our templated getLocalView<Device>(tag) is always taking the device path because DualView's type logic is inconsistent between view(), sync() and modify(). That means we could be skipping syncs and/or getting wrong modify flags.

@brian-kelley
Copy link
Contributor Author

That's not the cause of all the ifpack2 failures, though.

@kyungjoo-kim
Copy link
Contributor

@brian-kelley That is indeed a sneaky issue.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented Apr 6, 2021

I'm seeing some new failures without CUDA_LAUNCH_BLOCKING since we rebased on develop.

TpetraCore_CrsMatrix_Bug8794_MPI_4
TpetraCore_BlockCrsMatrix_MPI_4  -- this one is random, seems to fail roughly half the time
Ifpack2_RILUK_hb_belos_MPI_2
Ifpack2_RILUK_hb_belos_MPI_4
Ifpack2_AdditiveSchwarz_MPI_4
Ifpack2_RBILUK_MPI_4
Ifpack2_RILUKSingleProcessUnitTests_MPI_1

Update 4-13: only these two are failing for me now (cuda-unaware MPI, uvm, no launch block)

MueLu_UnitTestsIntrepid2Tpetra_MPI_1
MueLu_UnitTestsIntrepid2Tpetra_MPI_4

Update 4-16: The MueLu intrepid2 tests are passing for me after the last fix.

kddevin added a commit that referenced this issue Apr 27, 2021
…ement (#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels #872

(fix #8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF #8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in #8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host #8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue #3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
kddevin added a commit that referenced this issue May 4, 2021
* Tempus: Remove ParameterList from IntegratorBasic

Remove all the internal uses of ParameterList from IntegratorBasic.
This means moving the variables in the IntegratorBasic ParameterList
to member data. Integrator will not longer inherit from
Teuchos::ParameterListAcceptor. However IntegratorBasic can still
be built from a ParameterList, and will still provide a valid
ParameterList.

 * To break up these changes, created a copy of IntegratorBasic
   (i.e., IntegratorBasicOld) for the sensitivity analysis integrators
    - IntegratorAdjointSensitivity
    - IntegratorForwardSensitivity
    - IntegratorPseudoTransientAdjointSensitivity
    - IntegratorPseudoTransientForwardSensitivity
   so these can be upgraded in another PR.
 * IntegratorBasic is no longer inherited from ParameterListAcceptor.
    - Removed setParameterList.
    - IntegratorBasic constructors using ParameterLists have moved to
      nonmember constructors, e.g.,
      . integratorBasic(pl, model) --> createIntegratorBasic(pl, model)
      . IntegratorBasic(pl, model) --> createIntegratorBasic(pl, model)
    - Member data ParameterLists are removed.
    - Kept getValidParameters(), which now returns a ParameterList
      with the current values. Still matches ParameterListAcceptor
      signature.
 * Ensured that ParameterList names were correctly set, so
   getValidParameters() could be used to create nested ParameterLists,
   e.g., IntegratorBasic->Stepper->Solver.
 * Made getValidParametersBasic() a member functions of Stepper class.
 * Simplified setting the Stepper to just setStepper(stepper).
 * Added method to set model on stepper in IntegratorBasic,
   i.e., setModel(model).
 * The integrator observer is no longer a composite observer.
   It is simply a base class observer.
 * All internal IntegratorBasic references to member ParameterLists
   are changed to member data.
 * Added member data for the integrator name and type.  Name is a
   label that used for identification, e.g., 'My Integrator Basic'.
   Type defines the derived class being used, e.g., 'Integrator Basic'.
 * Added a shallow copy for the SolutionHistory.

* Tempus: Remove ParameterList from Internals of IntegratorBasic.

 * Changed Piro and ROL to use IntegratorBasicOld.  Will move to
   IntegratorBasic in future PR.
 * Added documentation on StepperName and StepperType to help
   distinguish between them.
 * setStepperType() is now a protected function of the Stepper
   class, which should help distinguish it against StepperName.
 * IMEX_RK and IMEX_RK_Partition now requires the stepperType
   in the default constructor to completely build it.  These
   should be changed to have a base class IMEX_RK and derived
   classes for each stepperType (similarly for IMEX_RK_Partition).
 * Fixed several misuses of stepperName and stepperType in source
   code and in unit tests.
 * Fixed some usage of Stepper aliases.

* Tpetra: add backup of scripts for perf testing

eclipse, vortex and stria env/build scripts.
These are used on SRN Jenkins and Watchr.

* cherry pick Kokkos-kernels PR #921
 Two-stage GS: add damping factors #921

* expose new options for two-stage GS from Ifpack2

* describe the two-stage parameters more in comments

* MueLu: Enable reuse of Ifpack2 smoothers

* Add openmpi 4.0.5 toolchain for VAN1

User Support Ticket(s) or Story Referenced: SPAR-969

* Add ctest drivers for new toolchains

* Correct an ordering issue and add tests

* ATDM/van1-tx2: Disable build stats

* re-basing muelu gold files with new two-stage gs parameters

* Replace VerifyExecutionCanAccessMemorySpace usage

Teuchos, Tpetra, Sacado, Stokhos: replace usage of deprecated
VerifyExecutionCanAccessMemorySpace with SpaceAccessibility for
compatibility with Kokkos.
See kokkos/kokkos#3813 for relevant changes.

* blake atdm environment: update cmake to 3.19.3

* Ifpack2 Hypre: Fix link errors when multiple node types are used

* Fixes mismatched new/delete and memory leaks in reused solver objects

Integrating the templated Basker solver directly into Xyce to perform
custom linear solves for harmonic balance (HB) analysis, some memory
issues were noticed.  First there is a mismatch in the new and delete
used to create and destroy the pinv object, respectively.  If the same
solver object is reused, the L and U factors are leaked.  Also some
internal workspace is not properly cleaned up.  Due to clarity the
internal workspace objects have been refactored and the L and U
factors are deleted if the solver object is called to perform a
numeric factorization when one already exists.

This addresses the memory issues that have been observed through
valgrind.

* casting inner-damping factor to complex only if scalar-type is complex.

* MueLu CreateOperator test: Set verbosity so that default values are ignored

* MueLu: Update gold files

* Revert "re-basing muelu gold files with new two-stage gs parameters"

This reverts commit feea4d8.

* trilinos_couplings: Replace VerifyExecutionCanAccessMemorySpace

replace usage of deprecated VerifyExecutionCanAccessMemorySpace
with SpaceAccessibility for compatibility with Kokkos.
See kokkos/kokkos#3813 for relevant changes.

* SEACAS: Fix warnings and memory leaks

Automatic snapshot commit from seacas at 29acd7f151

Origin repo remote tracking branch: 'origin/master'
Origin repo remote repo URL: 'origin = https://github.com/gsjaardema/seacas'

At commit:

commit 29acd7f1510bf729084274fe0cead3ef5e815dd8
Author:  Greg Sjaardema <gdsjaar@sandia.gov>
Date:    Wed Apr 14 10:29:28 2021 -0600
Summary: sync back sierra-build changes [ci skip]

    EXODIFF: Eliminate edge/face block memory leaks
    PLT: Support for flang/f18
    APREPRO: Better support for array memory leak management
    Fix compilation warnings on nvidia
    IOSS: Eliminate long compile when sanitizer enabled
    IOSS: Eliminate compiler warning
    APREPRO: Eliminate array memory leaks

* Ctest: Fixing emailer for ascicgpu031

* Automatic snapshot commit from tribits at 18aed92

Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'

At commit:

commit 18aed92550ebc8e8e0f7da2a5d38cd4eaa192e1f
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Apr 15 09:45:52 2021 -0600
Summary: Allow TRIBITS_ADD_ADVANCED_TEST_MAX_NUM_TEST_BLOCKS to be changed (#136)

* Tempus: Replace Logical Operators

The ROL team has received requests from Windows users to remove the
alternative logical operators 'and' and 'or' in favor of && and ||.
The C++ standard includes 'and' and 'or', of course. However, the
MS compiler only supports them in the -permissive mode, which
apparently isn't allowed in many companies, due to the quality
control niceties that the non-permissive (standard) mode provides.

Trilinos is not supporting Windows and do not have even platforms
to test on. This change should be straight forward, but there is
not any mechanism to prevent regression. Microsoft needs to support
the c++ standard!

Not sure all instances were found and there is no compiler flag
to throw on this.

* Galeri Xpetra: Add anisotrpic diffusion problem

* MueLu: fix agg export for multiple dofs per node

* muelu:  changes needed to handle Kokkos::complex in Cuda builds

* Turn off 3 Zoltan tests that fail  due to a bug in spectrummpi

See #8798 for the discussion

* Geminga CUDA nightly: Enable complex

* Add short reason for the disablement and note the issue for more details

* Galeri: Fix issue in boundary conditions

* MueLu RefMaxwell: Pass corrected nullspace to coarse (1,1) hierarchy

* Xpetra: Add EpetraInverseOperator

Its 'apply' calls 'ApplyInverse' instead of 'Apply'

* TrilinosCouplings: Fix scaling of CurlCurl in Maxwell example

* tpetra:  adding test of branch Tpetra_UVM_Removal for SAKE

* STK: Snapshot 04-21-21 11:17 (#9039)

* add a few more timers

* make default for GmresSingleReduce as single-reduce MGS and no Newton basis

* make "delayed normalization" default for single-reduce MGS

* Intrepid2: fix 8801 (second attempt) (#9044)

* Intrepid2: resolve uninitialized variable warnings.

* Intrepid2: move lambda implementation into a functor to work around apparent CUDA compiler bug.  PR #9044, fix for #8801 (second attempt).

* When compiling with IBM Clang 11 + Cuda, Zoltan2 MJ captures 'this' inside lambdas

This patch
* Marks 1 function static, which was class const (but used not class variables).
* addresses this-> being used inside a nested team lambda (which clang doesn't like)
  To remove this, I moved sEpsilon as parameter to the function being called and
  was able to mark the function static as well (removing its use of this->sEpsilon)
  This entails creating a funciton-locally copy of sEpsilon so that it may be
  captured by the default [=] capture.

Both changes entail adding a `using` statement within the function so that the
now static class functions can be called (e.g., AlgMJ< ... >::

* Snapshot of kokkos.git from commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8

From repository at git@github.com:kokkos/kokkos.git

At commit:
commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8
Merge: 1fb0c284 ffc35a82
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Mon Apr 26 00:14:56 2021 -0600

    Merge branch 'release-candidate-3.4.0' for 3.4.00

    Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4

* Snapshot of kokkos-kernels.git from commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5

From repository at git@github.com:kokkos/kokkos-kernels.git

At commit:
commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5
Merge: fe439b21 dd0d4ef8
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Mon Apr 26 00:16:08 2021 -0600

    Merge branch 'release-candidate-3.4.0' for 3.4.00

    Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4

* Turn on Intrepid2 per #8310

* MueLu: Add test for aggregation export

* Ifpack2 Relaxation&Chebyshev: Use offsets in more cases

* Framework: update messaging on issue autocloser bot

* Framework: Update autocloser throttle limit to 70

* MueLu: removed unused lines from agg export test

* Tpetra MultiVector and BlockMultiVector refactor to remove UVM requirement (#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels #872

(fix #8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF #8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in #8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host #8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue #3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>

* ascicgpu031: Testing updates

* MueLu: agg export does not play well with kokkos_aggregates

* Ctest: Adding Belos to email script

* Ctest: Adding Belos to email script

* Ctest: Adding Belos to email script

* Setting Tpetra Deprecated Code = ON #9067

* Ifpack & Ifpack2: Fix tiny bug in L1 method

* KokkosKernels: Fix bug in Serial specialization of spmv

Will only hit spmvs with a beta of exactly -1 using the Serial backend.

* Ifpack2: Add single kernel for diagonal extraction, L1 and small entry fix

* Disable tests in the UVM Off build

* MueLu: correct for variation due to roundoff in Convex Hulls

* Ifpack2 Relaxation: Add missing typedefs

Co-authored-by: Curtis C. Ober <ccober@sandia.gov>
Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: iyamaza <iyamaza@sandia.gov>
Co-authored-by: Christian Glusa <caglusa@sandia.gov>
Co-authored-by: Samuel Browne <sebrown@sandia.gov>
Co-authored-by: Evan Harvey <57234914+e10harvey@users.noreply.github.com>
Co-authored-by: Evan Harvey <eharvey@sandia.gov>
Co-authored-by: Nathan Ellingwood <ndellin@sandia.gov>
Co-authored-by: trilinos-autotester <trilinos@sandia.gov>
Co-authored-by: Heidi K. Thornquist <hkthorn@sandia.gov>
Co-authored-by: Christian Glusa <cgcgcg@users.noreply.github.com>
Co-authored-by: Jonathan Hu <jhu@sandia.gov>
Co-authored-by: gsjaardema <gsjaardema@gmail.com>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Roscoe A. Bartlett <rabartl@sandia.gov>
Co-authored-by: Peter Ohm <pohm@sandia.gov>
Co-authored-by: Paul Wolfenbarger <prwolfe@sandia.gov>
Co-authored-by: Alan Williams <william@sandia.gov>
Co-authored-by: iyamazaki <ic.yamazaki@gmail.com>
Co-authored-by: Nate Roberts <nvrober@sandia.gov>
Co-authored-by: James J. Elliott <jjellio@sandia.gov>
Co-authored-by: Jennifer Loe <jloe@sandia.gov>
Co-authored-by: Henry Swantner <HRSwant@sandia.gov>
Co-authored-by: Christian Trott <crtrott@sandia.gov>
Co-authored-by: William McLendon <wcmclen@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
Co-authored-by: James Elliott <jjellio@users.noreply.github.com>
jrobcary pushed a commit to Tech-XCorp/Trilinos that referenced this issue May 4, 2021
…ement (trilinos#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. trilinos#8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version trilinos#8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels trilinos#872

(fix trilinos#8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF trilinos#8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in trilinos#8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host trilinos#8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue trilinos#3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (trilinos#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (trilinos#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
@kddevin kddevin closed this as completed Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) pkg: Tpetra UVM removal
Projects
None yet
Development

No branches or pull requests

6 participants