-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: track MultiVector DualView refactoring #8591
Comments
@brian-kelley would you put your branch name here so that I can fork from yours ? |
Branch for working on MultiVector DualView management refactor: TpetraDualViewRefactor. |
Oh.... the branch is from Trilinos repo. I looked for it from brian's forked repo. Now I got it. |
in order not to duplicate effort,
Item 30. This test a workflow 1) a user creates a device view, 2) tpetra wraps the device view in the dual view structure, 3) check data consistency when it modifies the user created device local view. This workflow is the thing that we do not want users to do. When a user creates a device view and hand it to Tpetra, it breaks the Tpetra dual view logic as the user hold the local object (reference count is unmatched when Tpetra is created). If we want to give this option to users, we need to allocate the dual view and make deep copy and users local device view lose the link to Tpetra. Soft copy of device(or host) view to dual view is not working if we want to keep the transactional behavior on the tpetra dual view context. Tpetra team reaches work-around for the item 30. Leave the interface for wrapping a device view interface (soft copy to tpetra dual view device view or host view) which allows reference counting does not match when tpetra object is constructed. However, the testing scenario described in item 30 is the anti-pattern that we want to prevent it from users. So, we just do not test the part of the anti-pattern test. |
Issue 364 is done. |
as appropriate. #8591 Did not change getLocalView as the Const/NonConst versions of getLocalView do not exist yet Did not change MV_reduce_strided to avoid creating conflicts for @brian-kelley
OK. So I removed every reference to getLocalViewHost() and getLocalViewDevice() in Tpetra and the tests and replaced them with the appropriate const and non-const version. Complications:
|
List of things to do before merging:
Eventually, but not necessarily for PR merge
|
Thoughts on ReadOnly/WriteOnly/ReadWrite access:
@kddevin will change existing Access struct in withLocalAccess |
…s:develop' (dde88a7). * trilinos-develop: (24 commits) MueLu: Gold Files Phalanx: add queries to DAG Manager MueLu: More BlockDiagonal mods MueLu: Adding first cut at a block-diagonal coarsening MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings Piro: close trilinos#8641 Fix latest fix -- discarded const Turn Anasazi on for EMPIRE Fix compilation error on intel-17 Automatic snapshot commit from seacas at 729e904fa5 MueLu: RegionMG cleaning up test example dirver MueLu: fix typos in documentation MueLu CoalesceDrop: Fix FPE in cut-based distance Laplacian MueLu Hierarchy: Skip forced deletion of level vectors MueLu ParameterListInterpreter: Switch processing of coarse solve and smoothers MueLu: Add unit test for StratimikosSmoother tpetra: removed use of device_type; the test runs on host. trilinos#8591 MueLu: rename input to InterfaceAggFact MueLu: fix SchurComplement for parallel runs MueLu: add test for dof-based InterfaceAggFact ...
…s:develop' (dde88a7). * trilinos-develop: (24 commits) MueLu: Gold Files Phalanx: add queries to DAG Manager MueLu: More BlockDiagonal mods MueLu: Adding first cut at a block-diagonal coarsening MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings Piro: close trilinos#8641 Fix latest fix -- discarded const Turn Anasazi on for EMPIRE Fix compilation error on intel-17 Automatic snapshot commit from seacas at 729e904fa5 MueLu: RegionMG cleaning up test example dirver MueLu: fix typos in documentation MueLu CoalesceDrop: Fix FPE in cut-based distance Laplacian MueLu Hierarchy: Skip forced deletion of level vectors MueLu ParameterListInterpreter: Switch processing of coarse solve and smoothers MueLu: Add unit test for StratimikosSmoother tpetra: removed use of device_type; the test runs on host. trilinos#8591 MueLu: rename input to InterfaceAggFact MueLu: fix SchurComplement for parallel runs MueLu: add test for dof-based InterfaceAggFact ...
…develop' (1015f63). * potential-trilinos-develop: (37 commits) Framework: add LDFLAGS to config.ini file for Intel 17 build MueLu: Gold Files Framework: intel 19 build with CMake 3.17.x or greater Framework: intel 17 build with CMake 3.17.x Framework: testing - disable cmake_cxx_flags for intel 17 build Framework: testing cmake updates and intel compiler error Framework: testing cmake updates and intel compiler error Framework: testing cmake updates and intel compiler error MueLu: More BlockDiagonal mods MueLu: Adding first cut at a block-diagonal coarsening MiniTensor: Fix inconsistent KOKKOS_INLINE_FUNCTION markings PANZER: speed up CubeHexMeshFactory by populating edge blocks using a vector of edges PANZER: speed up CubeHexMeshFactory by populating face blocks using a vector of faces PANZER: speed up CubeHexMeshFactory by reducing the number of calls to beginModification()/endModification() PANZER: add a disable_subcells test to check the "Build Subcells: false" case Piro: close trilinos#8641 Fix latest fix -- discarded const Turn Anasazi on for EMPIRE Fix compilation error on intel-17 Automatic snapshot commit from seacas at 729e904fa5 MueLu: RegionMG cleaning up test example dirver MueLu: fix typos in documentation tpetra: removed use of device_type; the test runs on host. trilinos#8591 ...
@brian-kelley @csiefer2 I got all tests except the mv-reduce-strided test passing today. |
I removed the sync, modify things on the multivector but I leave the interface there. This interface can be removed later after other items are fixed ie., local access interface is eliminated. |
@kyungjoo-kim @kddevin I just checked in the removal of withLocalAccess, for_each and transform. The one issue I still see in Ifpack2 is that RBILUK is still trying to call removed BlockMultiVector functions like sync_host. But the things that used withLocalAccess (Chebyshev, Relaxation) built and passed before I rebased on that commit. (I mean with UVM) |
I got rid of all the deprecated MV/BlockMV calls in Ifpack2, they're all using the new interface. There were 2 tests in MultiVector_LocalViewTests that failed with CudaUVMSpace: HostDeviceView and DeviceHostView. I think the issue is that they were expecting
and that fixed HostDeviceView. DeviceHostView I commented out but the same fix should work there. |
Ifpack2 failures with UVM, without launch blocking:
|
@kddevin With the last commit, all tests in Tpetra, Belos, Ifpack2, MueLu, and Teko pass with UVM and without launch blocking. |
@brian-kelley Does the last commit mean that the fencing execution space in sync_host and sync_device ? If so, that does not make sense. When I and @crtrott talked, the execution fence in @crtrott any idea ? |
Did someone say "double fencing"? @csiefer2 might have something to say about that. |
@jhux2 Luckily, this is a 0 vs. 1 fences being different, not a 1 vs. 2 fences. @kyungjoo-kim You were right, even though the fencing was not consistent before, just adding them to sync_host and sync_device wasn't the correct solution. Have a look at kokkos/kokkos#3850 . I think our templated |
That's not the cause of all the ifpack2 failures, though. |
@brian-kelley That is indeed a sneaky issue. |
I'm seeing some new failures without CUDA_LAUNCH_BLOCKING since we rebased on develop.
Update 4-13: only these two are failing for me now (cuda-unaware MPI, uvm, no launch block)
Update 4-16: The MueLu intrepid2 tests are passing for me after the last fix. |
…ement (#8821) * Tpetra: add new user-friendly MV view access Also add new "owningView_" DualView member that refers to the actual original DV (not a subview of anything else). This is the DualView to sync in order to maintain consistency regardless of how MultiVectors alias each other. 4 new view accessor functions: getLocalView[Host|Device][Non]Const() - Respect constness - Manage syncs and modifies for the user - Prevent taking out a view in one space while any view in the other space is live. - Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just have the reference count checking added (no sync/modify). This has no effect for HostSpace or CudaUVMSpace since those host mirrors match the device views. * Tpetra - fix MV test 14. * Tpetra - fix item 17 * Tpetra - fix item 20 * Tpetra - fix item 23 * Tpetra - fix item 28 * Tpetra - fix item 29 * Tpetra - fix item 35 * Tpetra - workaround for item 30 * Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd) * Tpetra: fix MV [un]pack to respect host/device refcounts * fix nonconst in Bug7745 * Tpetra: stashing * Tpetra - issue 354 fix * Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views * Tpetra: remove static_assert, fix getLocalView() ret type Remove bad static_assert that tripped for Cuda/CudaUVMSpace build. Correct MultiVector::getLocalView() return type to be exactly consistent with DualView::view(). * tpetra: fixed error in MultiVector pack that caused failures with UVM=ON * tpetra: Fix for FEMultivector -- rather than take the subview of a DualView and create a new vector with it, use the MultiVector constructor that gets "offset" views of a vector (in which @brian-kelley has the owningView_ working correctly). While I was at it, I added a swap of the owningView_ to the MultiVector swap() function. * Tpetra: Fixing ImportExport/Issue3968: The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking * tpetra: fix to work without UVM * tpetra: changed getLocalViewHost/Device to new Const/NonConst versions as appropriate. #8591 Did not change getLocalView as the Const/NonConst versions of getLocalView do not exist yet Did not change MV_reduce_strided to avoid creating conflicts for @brian-kelley * tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591 * Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions * Tpetra: More getLocalView mods * Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions * Tpetra: Fixing scaleBlockDiagonal signature as per Brian * Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF * Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging * Tpetra - missing getlocal view "device" * Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly Make WithLocalAccess use these tags instead of internal Details:: ones. These will also be used for the new MultiVector view access interface. * moving from getLocalView... to getLocalView...(Tpetra::Accesspattern) * Tpetra - get1dview logic change * Tpetra, WIP: using new tagged view access * Tpetra: use new interface for all MV getLocalView * tpetra: removed unneeded include file * Tpetra: Tags! * Tpetra: Tags! * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing tests * Tpetra: Fixing tests * Tpetra: Fixing tests * tpetra: copied implementation of getLocalViewHost and getLocalViewDevice from templated getLocalView, as the getLocalView version does not work. This commit may be temporary, but it allows us to make progress on other bugs while someone figures out the template-fu. Sorry for the debugging statements; we'll get rid of those eventually. * adding localview tests * tpetra: getLocalView<template> now works. cleaned up my obnoxious print statements kept Host and Device implementations that do NOT use getLocalView. * tpetra: added Tpetra::Access to many getLocalView<> instances Tests still pass with UVM=ON. * Tpetra: Removing the dreaded parantheses from the Access tags * Manually intercept UVM allocations, throw exception Effectively makes it impossible for any UVM allocations to exist (except for Stokhos, which calls cudaMallocManaged directly) * Tpetra: Deprecate old getLocalView functions * Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON * tpetra: changed getLocalView to use access tags and getLocalViewDevice * tpetra: added access tags to getLocalView(); fixed scope of some pointers * xpetra: fixes to allow compilation * WIP: deprecate getLocalBlock and start adding tagged overloads * Tpetra: rewrite allReduceView to work with non-UVM allReduceView had one bug and one sub-optimal thing: - Tried to make a view copy with both layout and device different - Kokkos can't do that in a single deep_copy - If a LayoutStride -> contiguous copy needed to be made, it always used LayoutLeft. If one of the input/output views was LayoutStride and the other was LayoutRight, they would both be copied to LayoutLeft. Now, use LayoutRight in this case. Some utilities to help manage layouts and MPI + Kokkos views in general are in the new file temporaryViewUtils.hpp: layout unification, making a contiguous view, and making an MPI-safe view. In the future these can be used to clean up idot and iallreduce without losing efficiency. * Tpetra: Block MultiVector correctly uses getLocalView; removed stored pointer * fix host device type for const_little_host_vec_type * tpetra: clean up of BlockMultiVector fixes * Tpetra: deprecated held pointer mvData_ * tpetra: removed modifies without syncs; fixed MueLu tests * Tpetra - removing sync in ScaleAndAssign test * Tpetra - unit test is okay without modify and sync flags * Tpetra - test passes without modify and sync operations * Tpetra - remove unnecessary sync modify clear state flags * Tpetra - remove multi vector sync/modify/ things * Tpetra - remove sync modify things in other places * Tpetra: remove withLocalAccess, for_each, transform The new MV::getLocalView interface is a simpler substitute for these. * Issue 8391. Switched to C++17 standard for GCC 8.3 build. * FROSch: Convert enum NullSpaceType to scoped enum By converting the enum to an enum class NullSpaceType, one is forced to use the enum class and cannot replace it with integers anymore. This guarantees, that the expressive enum class is used in implementations rather than the implicitly encoded integers. * Patch in KokkosKernels #872 (fix #8727, TeamPolicy team size too large in sort_crs_*) Adds the KokkosKernels unit test that replicated this issue. * MueLu: Adding Aggregate size percentiles to AggregateQuality * Moved Tpetra CRS GS into Ifpack2 Relaxation * Moved BlockCrs GS functionality into Relaxation * Enabled new local GS code for CRS * Reduce redundant code in CRS (GS/SGS use same fn) * Using refactored block CRS local apply, unify GS/SGS * More refactoring to get rid of redundant functions * Added required syncs/modifies for vectors * Removed unneeded !constantStride paths * Use cached MV to replace getColumnMapMV from CrsMatrix * Ifpack2: remove unneeded includes * Ifpack2: undo some find-and-replace in comments Undoing some "Node" -> "node_type" * MueLu: undo CMake change, should be its own PR * MueLu: in configure, print out missing ETI setting During configure, MueLu prints out the type combinations to ETI. Add <complex, int, long long> to this, since it was missing. * tpetra: treat WriteOnly of subviews as ReadOnly. * Ifpack2: in RBILUK, use tagged BMV::getLocalBlock * Tpetra: add comment with caveat on BMV::getLocalBlock(i, j, WriteOnly) * tpetra: separated BugTests.cpp into separate test files so that we can disable them separately (since they exercise different classes). * Ifpack2: update BMV getLocalBlock calls to use tagged access, and not use manual sync/modify (which has been removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass. * more test changes * mv localview tests * wrapped up 6 tests for new behaviors * tpetra: scoping fix for Bug7234.cpp; more output from getLocalView* when error occurs, as in parallel runs, throw messages weren't always printed (e.g., from doExport when only 3/4 processors failed) * Tpetra: add MV::aliases(const MV& other) This allows a user to see if two MVs overlap, without actually getting the local views and possibly hitting the reference count checker. * Ifpack2: const correctness, use new getLocalView - Throughout Ifpack2, remove manual sync/modify and calls to deprecated getLocalView. Use tagged getLocalView instead. - In BlockRelaxation and the Containers, change interfaces to use const on views and multivectors that aren't actually modified * Tpetra: fix one MV LocalView test, comment out another We will make sure fix is OK, then uncomment and fix the other * tpetra: enable some Tpetra tests without UVM * tpetra: fix test for non-Cuda builds * Ifpack2: fix more constness of apply vectors * Kokkos: allow CudaUVMSpace::allocate again Roll back change that made CudaUVMSpace::allocate throw when UVM was not the default memory space for Cuda. * tpetra: changes needed to build with DEPRECATED_CODE=OFF #8821 * fix remaining test * Tpetra - fix for nox failure * Thyra: added missing fences to euclidean apply operations used in MvTimesMatAddMv; the fences resolve test failures with CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo Tpetra: the fences above provide a more surgical fix to the test errors seen in #8821; this commit removes fences from getLocalView*(ReadOnly). @kyungjoo-kim Belos: preventive fence added with @hkthorn's blessing to mimic those in Thyra. * tpetra: added fence between device kernels and retrieving blocks on host #8821 * Ifpack2: Minor fix * DualView: make fencing behavior in sync consistent sync<Device>() does extra exec space fences if the dev/host memory spaces are the same. This was missing in sync_host/sync_device, so this adds it there. Makes all Ifpack2 tests for UVM without launch blocking. * tpetra: exercise the Teuchos-based interfaces, too * changed access control from WriteOnly to OverwriteAll because semantics mean things * WIP: fixing idot for MV dualview refactor And some udpates to ifpack2 and amesos2 about that. Working around Kokkos issue #3850 where the templated getLocalView was used. * WIP: idot/iallreduce cleanup * Tpetra: finish idot/iallreduce refactor * Fixed iallreduce test for non-uvm device * Belos: use new Tpetra MV view interface * Cleanup * Remove extra dualview sync fences * Ifpack2 passes without launch blocking except RBILUK. * Ifpack2: add temporary fence in RBILUK for BlockCrs Later it should be possible to replace this fence with a refactored DualView interface to BlockCrs. * Tpetra: add a global reduce to a test so it will fail when only one proc is failing * Tpetra: fix some typos in a Map unit test * Tpetra: remove deprecated sync/modify calls from a unit test * Ifpack2: fix impl_scalar/scalar mismatch * Tpetra: remove/update remaining mentions of Gauss-Seidel * Tpetra: fix iallreduce for builds without MPI * Ifpack2: revert commenting out try/catch Was causing unused var warning * Ifpack2: Fixing vector mode mistake * tpetra, ifpack2: fixing several access mode errors * Tpetra: use new MV view interface in Bug8794 test * Amesos2: revert using tagged Tpetra MV getLocalView for some reason, using ReadOnly tag to access MV view in TpetraMultivecAdapter caused solve solution to not get copied back to the Tpetra multivector. This is surprising because the views were just used as the source for a Kokkos deep copy, and this caused BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews are trivial, and all kernels are synchronous) * Ifpack2: add back tag clobbered by merge * kokkos: patch from kokkos/kokkos#3857 * comment out all the instances of TPETRA_DEPRECATED (#9023) * MueLu: add fence for recent intrepid2 changes Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking. * Tpetra: restore MV_reduce_strided test. Key: use the MV (map, dualview, orig_dualview) constructor instead of the (map, dualview) constructor. If $dualview is noncontiguous, the first one lets you pass orig_dualview as the contiguous super-view containing dualview, and orig_dualview can be sync'd without problems. Also modify TempView::toLayout() to test span_is_contiguous, rather than assuming that (Layout != LayoutStride) implies contiguous. * tpetra: Removed deprecated sync_device calls * Tpetra: Remove some MultiVector that were checking modification state (#9032) * Tpetra: Deprecate need_sync* in MultiVector * Tpetra: for now, we won't deprecate need_sync_host/device * tpetra: removed instantiations of removed tests * Tpetra: don't use CudaSpace in nonblocking collectives OpenMPI does not support Cuda device buffers for nonblocking collectives like MPI_Iallreduce, even with a Cuda-aware installation. * Fix old typo in Ifpack2_UnitTestBlockRelaxation * Fix access tag: OverwriteAll -> ReadWrite Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later. * Undo bad DualView merge Co-authored-by: Brian Kelley <bmkelle@sandia.gov> Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov> Co-authored-by: Chris Siefert <csiefer@sandia.gov> Co-authored-by: Geoff Danielson <gcdanie@sandia.gov> Co-authored-by: Timothy A. Smith <tasmit@sandia.gov> Co-authored-by: James M. Willenbring <jmwille@sandia.gov> Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de> Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
* Tempus: Remove ParameterList from IntegratorBasic Remove all the internal uses of ParameterList from IntegratorBasic. This means moving the variables in the IntegratorBasic ParameterList to member data. Integrator will not longer inherit from Teuchos::ParameterListAcceptor. However IntegratorBasic can still be built from a ParameterList, and will still provide a valid ParameterList. * To break up these changes, created a copy of IntegratorBasic (i.e., IntegratorBasicOld) for the sensitivity analysis integrators - IntegratorAdjointSensitivity - IntegratorForwardSensitivity - IntegratorPseudoTransientAdjointSensitivity - IntegratorPseudoTransientForwardSensitivity so these can be upgraded in another PR. * IntegratorBasic is no longer inherited from ParameterListAcceptor. - Removed setParameterList. - IntegratorBasic constructors using ParameterLists have moved to nonmember constructors, e.g., . integratorBasic(pl, model) --> createIntegratorBasic(pl, model) . IntegratorBasic(pl, model) --> createIntegratorBasic(pl, model) - Member data ParameterLists are removed. - Kept getValidParameters(), which now returns a ParameterList with the current values. Still matches ParameterListAcceptor signature. * Ensured that ParameterList names were correctly set, so getValidParameters() could be used to create nested ParameterLists, e.g., IntegratorBasic->Stepper->Solver. * Made getValidParametersBasic() a member functions of Stepper class. * Simplified setting the Stepper to just setStepper(stepper). * Added method to set model on stepper in IntegratorBasic, i.e., setModel(model). * The integrator observer is no longer a composite observer. It is simply a base class observer. * All internal IntegratorBasic references to member ParameterLists are changed to member data. * Added member data for the integrator name and type. Name is a label that used for identification, e.g., 'My Integrator Basic'. Type defines the derived class being used, e.g., 'Integrator Basic'. * Added a shallow copy for the SolutionHistory. * Tempus: Remove ParameterList from Internals of IntegratorBasic. * Changed Piro and ROL to use IntegratorBasicOld. Will move to IntegratorBasic in future PR. * Added documentation on StepperName and StepperType to help distinguish between them. * setStepperType() is now a protected function of the Stepper class, which should help distinguish it against StepperName. * IMEX_RK and IMEX_RK_Partition now requires the stepperType in the default constructor to completely build it. These should be changed to have a base class IMEX_RK and derived classes for each stepperType (similarly for IMEX_RK_Partition). * Fixed several misuses of stepperName and stepperType in source code and in unit tests. * Fixed some usage of Stepper aliases. * Tpetra: add backup of scripts for perf testing eclipse, vortex and stria env/build scripts. These are used on SRN Jenkins and Watchr. * cherry pick Kokkos-kernels PR #921 Two-stage GS: add damping factors #921 * expose new options for two-stage GS from Ifpack2 * describe the two-stage parameters more in comments * MueLu: Enable reuse of Ifpack2 smoothers * Add openmpi 4.0.5 toolchain for VAN1 User Support Ticket(s) or Story Referenced: SPAR-969 * Add ctest drivers for new toolchains * Correct an ordering issue and add tests * ATDM/van1-tx2: Disable build stats * re-basing muelu gold files with new two-stage gs parameters * Replace VerifyExecutionCanAccessMemorySpace usage Teuchos, Tpetra, Sacado, Stokhos: replace usage of deprecated VerifyExecutionCanAccessMemorySpace with SpaceAccessibility for compatibility with Kokkos. See kokkos/kokkos#3813 for relevant changes. * blake atdm environment: update cmake to 3.19.3 * Ifpack2 Hypre: Fix link errors when multiple node types are used * Fixes mismatched new/delete and memory leaks in reused solver objects Integrating the templated Basker solver directly into Xyce to perform custom linear solves for harmonic balance (HB) analysis, some memory issues were noticed. First there is a mismatch in the new and delete used to create and destroy the pinv object, respectively. If the same solver object is reused, the L and U factors are leaked. Also some internal workspace is not properly cleaned up. Due to clarity the internal workspace objects have been refactored and the L and U factors are deleted if the solver object is called to perform a numeric factorization when one already exists. This addresses the memory issues that have been observed through valgrind. * casting inner-damping factor to complex only if scalar-type is complex. * MueLu CreateOperator test: Set verbosity so that default values are ignored * MueLu: Update gold files * Revert "re-basing muelu gold files with new two-stage gs parameters" This reverts commit feea4d8. * trilinos_couplings: Replace VerifyExecutionCanAccessMemorySpace replace usage of deprecated VerifyExecutionCanAccessMemorySpace with SpaceAccessibility for compatibility with Kokkos. See kokkos/kokkos#3813 for relevant changes. * SEACAS: Fix warnings and memory leaks Automatic snapshot commit from seacas at 29acd7f151 Origin repo remote tracking branch: 'origin/master' Origin repo remote repo URL: 'origin = https://github.com/gsjaardema/seacas' At commit: commit 29acd7f1510bf729084274fe0cead3ef5e815dd8 Author: Greg Sjaardema <gdsjaar@sandia.gov> Date: Wed Apr 14 10:29:28 2021 -0600 Summary: sync back sierra-build changes [ci skip] EXODIFF: Eliminate edge/face block memory leaks PLT: Support for flang/f18 APREPRO: Better support for array memory leak management Fix compilation warnings on nvidia IOSS: Eliminate long compile when sanitizer enabled IOSS: Eliminate compiler warning APREPRO: Eliminate array memory leaks * Ctest: Fixing emailer for ascicgpu031 * Automatic snapshot commit from tribits at 18aed92 Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git' At commit: commit 18aed92550ebc8e8e0f7da2a5d38cd4eaa192e1f Author: Roscoe A. Bartlett <rabartl@sandia.gov> Date: Thu Apr 15 09:45:52 2021 -0600 Summary: Allow TRIBITS_ADD_ADVANCED_TEST_MAX_NUM_TEST_BLOCKS to be changed (#136) * Tempus: Replace Logical Operators The ROL team has received requests from Windows users to remove the alternative logical operators 'and' and 'or' in favor of && and ||. The C++ standard includes 'and' and 'or', of course. However, the MS compiler only supports them in the -permissive mode, which apparently isn't allowed in many companies, due to the quality control niceties that the non-permissive (standard) mode provides. Trilinos is not supporting Windows and do not have even platforms to test on. This change should be straight forward, but there is not any mechanism to prevent regression. Microsoft needs to support the c++ standard! Not sure all instances were found and there is no compiler flag to throw on this. * Galeri Xpetra: Add anisotrpic diffusion problem * MueLu: fix agg export for multiple dofs per node * muelu: changes needed to handle Kokkos::complex in Cuda builds * Turn off 3 Zoltan tests that fail due to a bug in spectrummpi See #8798 for the discussion * Geminga CUDA nightly: Enable complex * Add short reason for the disablement and note the issue for more details * Galeri: Fix issue in boundary conditions * MueLu RefMaxwell: Pass corrected nullspace to coarse (1,1) hierarchy * Xpetra: Add EpetraInverseOperator Its 'apply' calls 'ApplyInverse' instead of 'Apply' * TrilinosCouplings: Fix scaling of CurlCurl in Maxwell example * tpetra: adding test of branch Tpetra_UVM_Removal for SAKE * STK: Snapshot 04-21-21 11:17 (#9039) * add a few more timers * make default for GmresSingleReduce as single-reduce MGS and no Newton basis * make "delayed normalization" default for single-reduce MGS * Intrepid2: fix 8801 (second attempt) (#9044) * Intrepid2: resolve uninitialized variable warnings. * Intrepid2: move lambda implementation into a functor to work around apparent CUDA compiler bug. PR #9044, fix for #8801 (second attempt). * When compiling with IBM Clang 11 + Cuda, Zoltan2 MJ captures 'this' inside lambdas This patch * Marks 1 function static, which was class const (but used not class variables). * addresses this-> being used inside a nested team lambda (which clang doesn't like) To remove this, I moved sEpsilon as parameter to the function being called and was able to mark the function static as well (removing its use of this->sEpsilon) This entails creating a funciton-locally copy of sEpsilon so that it may be captured by the default [=] capture. Both changes entail adding a `using` statement within the function so that the now static class functions can be called (e.g., AlgMJ< ... >:: * Snapshot of kokkos.git from commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8 From repository at git@github.com:kokkos/kokkos.git At commit: commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8 Merge: 1fb0c284 ffc35a82 Author: Nathan Ellingwood <ndellin@sandia.gov> Date: Mon Apr 26 00:14:56 2021 -0600 Merge branch 'release-candidate-3.4.0' for 3.4.00 Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4 * Snapshot of kokkos-kernels.git from commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5 From repository at git@github.com:kokkos/kokkos-kernels.git At commit: commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5 Merge: fe439b21 dd0d4ef8 Author: Nathan Ellingwood <ndellin@sandia.gov> Date: Mon Apr 26 00:16:08 2021 -0600 Merge branch 'release-candidate-3.4.0' for 3.4.00 Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4 * Turn on Intrepid2 per #8310 * MueLu: Add test for aggregation export * Ifpack2 Relaxation&Chebyshev: Use offsets in more cases * Framework: update messaging on issue autocloser bot * Framework: Update autocloser throttle limit to 70 * MueLu: removed unused lines from agg export test * Tpetra MultiVector and BlockMultiVector refactor to remove UVM requirement (#8821) * Tpetra: add new user-friendly MV view access Also add new "owningView_" DualView member that refers to the actual original DV (not a subview of anything else). This is the DualView to sync in order to maintain consistency regardless of how MultiVectors alias each other. 4 new view accessor functions: getLocalView[Host|Device][Non]Const() - Respect constness - Manage syncs and modifies for the user - Prevent taking out a view in one space while any view in the other space is live. - Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just have the reference count checking added (no sync/modify). This has no effect for HostSpace or CudaUVMSpace since those host mirrors match the device views. * Tpetra - fix MV test 14. * Tpetra - fix item 17 * Tpetra - fix item 20 * Tpetra - fix item 23 * Tpetra - fix item 28 * Tpetra - fix item 29 * Tpetra - fix item 35 * Tpetra - workaround for item 30 * Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd) * Tpetra: fix MV [un]pack to respect host/device refcounts * fix nonconst in Bug7745 * Tpetra: stashing * Tpetra - issue 354 fix * Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views * Tpetra: remove static_assert, fix getLocalView() ret type Remove bad static_assert that tripped for Cuda/CudaUVMSpace build. Correct MultiVector::getLocalView() return type to be exactly consistent with DualView::view(). * tpetra: fixed error in MultiVector pack that caused failures with UVM=ON * tpetra: Fix for FEMultivector -- rather than take the subview of a DualView and create a new vector with it, use the MultiVector constructor that gets "offset" views of a vector (in which @brian-kelley has the owningView_ working correctly). While I was at it, I added a swap of the owningView_ to the MultiVector swap() function. * Tpetra: Fixing ImportExport/Issue3968: The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking * tpetra: fix to work without UVM * tpetra: changed getLocalViewHost/Device to new Const/NonConst versions as appropriate. #8591 Did not change getLocalView as the Const/NonConst versions of getLocalView do not exist yet Did not change MV_reduce_strided to avoid creating conflicts for @brian-kelley * tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591 * Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions * Tpetra: More getLocalView mods * Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions * Tpetra: Fixing scaleBlockDiagonal signature as per Brian * Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF * Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging * Tpetra - missing getlocal view "device" * Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly Make WithLocalAccess use these tags instead of internal Details:: ones. These will also be used for the new MultiVector view access interface. * moving from getLocalView... to getLocalView...(Tpetra::Accesspattern) * Tpetra - get1dview logic change * Tpetra, WIP: using new tagged view access * Tpetra: use new interface for all MV getLocalView * tpetra: removed unneeded include file * Tpetra: Tags! * Tpetra: Tags! * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing tests * Tpetra: Fixing tests * Tpetra: Fixing tests * tpetra: copied implementation of getLocalViewHost and getLocalViewDevice from templated getLocalView, as the getLocalView version does not work. This commit may be temporary, but it allows us to make progress on other bugs while someone figures out the template-fu. Sorry for the debugging statements; we'll get rid of those eventually. * adding localview tests * tpetra: getLocalView<template> now works. cleaned up my obnoxious print statements kept Host and Device implementations that do NOT use getLocalView. * tpetra: added Tpetra::Access to many getLocalView<> instances Tests still pass with UVM=ON. * Tpetra: Removing the dreaded parantheses from the Access tags * Manually intercept UVM allocations, throw exception Effectively makes it impossible for any UVM allocations to exist (except for Stokhos, which calls cudaMallocManaged directly) * Tpetra: Deprecate old getLocalView functions * Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON * tpetra: changed getLocalView to use access tags and getLocalViewDevice * tpetra: added access tags to getLocalView(); fixed scope of some pointers * xpetra: fixes to allow compilation * WIP: deprecate getLocalBlock and start adding tagged overloads * Tpetra: rewrite allReduceView to work with non-UVM allReduceView had one bug and one sub-optimal thing: - Tried to make a view copy with both layout and device different - Kokkos can't do that in a single deep_copy - If a LayoutStride -> contiguous copy needed to be made, it always used LayoutLeft. If one of the input/output views was LayoutStride and the other was LayoutRight, they would both be copied to LayoutLeft. Now, use LayoutRight in this case. Some utilities to help manage layouts and MPI + Kokkos views in general are in the new file temporaryViewUtils.hpp: layout unification, making a contiguous view, and making an MPI-safe view. In the future these can be used to clean up idot and iallreduce without losing efficiency. * Tpetra: Block MultiVector correctly uses getLocalView; removed stored pointer * fix host device type for const_little_host_vec_type * tpetra: clean up of BlockMultiVector fixes * Tpetra: deprecated held pointer mvData_ * tpetra: removed modifies without syncs; fixed MueLu tests * Tpetra - removing sync in ScaleAndAssign test * Tpetra - unit test is okay without modify and sync flags * Tpetra - test passes without modify and sync operations * Tpetra - remove unnecessary sync modify clear state flags * Tpetra - remove multi vector sync/modify/ things * Tpetra - remove sync modify things in other places * Tpetra: remove withLocalAccess, for_each, transform The new MV::getLocalView interface is a simpler substitute for these. * Issue 8391. Switched to C++17 standard for GCC 8.3 build. * FROSch: Convert enum NullSpaceType to scoped enum By converting the enum to an enum class NullSpaceType, one is forced to use the enum class and cannot replace it with integers anymore. This guarantees, that the expressive enum class is used in implementations rather than the implicitly encoded integers. * Patch in KokkosKernels #872 (fix #8727, TeamPolicy team size too large in sort_crs_*) Adds the KokkosKernels unit test that replicated this issue. * MueLu: Adding Aggregate size percentiles to AggregateQuality * Moved Tpetra CRS GS into Ifpack2 Relaxation * Moved BlockCrs GS functionality into Relaxation * Enabled new local GS code for CRS * Reduce redundant code in CRS (GS/SGS use same fn) * Using refactored block CRS local apply, unify GS/SGS * More refactoring to get rid of redundant functions * Added required syncs/modifies for vectors * Removed unneeded !constantStride paths * Use cached MV to replace getColumnMapMV from CrsMatrix * Ifpack2: remove unneeded includes * Ifpack2: undo some find-and-replace in comments Undoing some "Node" -> "node_type" * MueLu: undo CMake change, should be its own PR * MueLu: in configure, print out missing ETI setting During configure, MueLu prints out the type combinations to ETI. Add <complex, int, long long> to this, since it was missing. * tpetra: treat WriteOnly of subviews as ReadOnly. * Ifpack2: in RBILUK, use tagged BMV::getLocalBlock * Tpetra: add comment with caveat on BMV::getLocalBlock(i, j, WriteOnly) * tpetra: separated BugTests.cpp into separate test files so that we can disable them separately (since they exercise different classes). * Ifpack2: update BMV getLocalBlock calls to use tagged access, and not use manual sync/modify (which has been removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass. * more test changes * mv localview tests * wrapped up 6 tests for new behaviors * tpetra: scoping fix for Bug7234.cpp; more output from getLocalView* when error occurs, as in parallel runs, throw messages weren't always printed (e.g., from doExport when only 3/4 processors failed) * Tpetra: add MV::aliases(const MV& other) This allows a user to see if two MVs overlap, without actually getting the local views and possibly hitting the reference count checker. * Ifpack2: const correctness, use new getLocalView - Throughout Ifpack2, remove manual sync/modify and calls to deprecated getLocalView. Use tagged getLocalView instead. - In BlockRelaxation and the Containers, change interfaces to use const on views and multivectors that aren't actually modified * Tpetra: fix one MV LocalView test, comment out another We will make sure fix is OK, then uncomment and fix the other * tpetra: enable some Tpetra tests without UVM * tpetra: fix test for non-Cuda builds * Ifpack2: fix more constness of apply vectors * Kokkos: allow CudaUVMSpace::allocate again Roll back change that made CudaUVMSpace::allocate throw when UVM was not the default memory space for Cuda. * tpetra: changes needed to build with DEPRECATED_CODE=OFF #8821 * fix remaining test * Tpetra - fix for nox failure * Thyra: added missing fences to euclidean apply operations used in MvTimesMatAddMv; the fences resolve test failures with CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo Tpetra: the fences above provide a more surgical fix to the test errors seen in #8821; this commit removes fences from getLocalView*(ReadOnly). @kyungjoo-kim Belos: preventive fence added with @hkthorn's blessing to mimic those in Thyra. * tpetra: added fence between device kernels and retrieving blocks on host #8821 * Ifpack2: Minor fix * DualView: make fencing behavior in sync consistent sync<Device>() does extra exec space fences if the dev/host memory spaces are the same. This was missing in sync_host/sync_device, so this adds it there. Makes all Ifpack2 tests for UVM without launch blocking. * tpetra: exercise the Teuchos-based interfaces, too * changed access control from WriteOnly to OverwriteAll because semantics mean things * WIP: fixing idot for MV dualview refactor And some udpates to ifpack2 and amesos2 about that. Working around Kokkos issue #3850 where the templated getLocalView was used. * WIP: idot/iallreduce cleanup * Tpetra: finish idot/iallreduce refactor * Fixed iallreduce test for non-uvm device * Belos: use new Tpetra MV view interface * Cleanup * Remove extra dualview sync fences * Ifpack2 passes without launch blocking except RBILUK. * Ifpack2: add temporary fence in RBILUK for BlockCrs Later it should be possible to replace this fence with a refactored DualView interface to BlockCrs. * Tpetra: add a global reduce to a test so it will fail when only one proc is failing * Tpetra: fix some typos in a Map unit test * Tpetra: remove deprecated sync/modify calls from a unit test * Ifpack2: fix impl_scalar/scalar mismatch * Tpetra: remove/update remaining mentions of Gauss-Seidel * Tpetra: fix iallreduce for builds without MPI * Ifpack2: revert commenting out try/catch Was causing unused var warning * Ifpack2: Fixing vector mode mistake * tpetra, ifpack2: fixing several access mode errors * Tpetra: use new MV view interface in Bug8794 test * Amesos2: revert using tagged Tpetra MV getLocalView for some reason, using ReadOnly tag to access MV view in TpetraMultivecAdapter caused solve solution to not get copied back to the Tpetra multivector. This is surprising because the views were just used as the source for a Kokkos deep copy, and this caused BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews are trivial, and all kernels are synchronous) * Ifpack2: add back tag clobbered by merge * kokkos: patch from kokkos/kokkos#3857 * comment out all the instances of TPETRA_DEPRECATED (#9023) * MueLu: add fence for recent intrepid2 changes Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking. * Tpetra: restore MV_reduce_strided test. Key: use the MV (map, dualview, orig_dualview) constructor instead of the (map, dualview) constructor. If $dualview is noncontiguous, the first one lets you pass orig_dualview as the contiguous super-view containing dualview, and orig_dualview can be sync'd without problems. Also modify TempView::toLayout() to test span_is_contiguous, rather than assuming that (Layout != LayoutStride) implies contiguous. * tpetra: Removed deprecated sync_device calls * Tpetra: Remove some MultiVector that were checking modification state (#9032) * Tpetra: Deprecate need_sync* in MultiVector * Tpetra: for now, we won't deprecate need_sync_host/device * tpetra: removed instantiations of removed tests * Tpetra: don't use CudaSpace in nonblocking collectives OpenMPI does not support Cuda device buffers for nonblocking collectives like MPI_Iallreduce, even with a Cuda-aware installation. * Fix old typo in Ifpack2_UnitTestBlockRelaxation * Fix access tag: OverwriteAll -> ReadWrite Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later. * Undo bad DualView merge Co-authored-by: Brian Kelley <bmkelle@sandia.gov> Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov> Co-authored-by: Chris Siefert <csiefer@sandia.gov> Co-authored-by: Geoff Danielson <gcdanie@sandia.gov> Co-authored-by: Timothy A. Smith <tasmit@sandia.gov> Co-authored-by: James M. Willenbring <jmwille@sandia.gov> Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de> Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com> * ascicgpu031: Testing updates * MueLu: agg export does not play well with kokkos_aggregates * Ctest: Adding Belos to email script * Ctest: Adding Belos to email script * Ctest: Adding Belos to email script * Setting Tpetra Deprecated Code = ON #9067 * Ifpack & Ifpack2: Fix tiny bug in L1 method * KokkosKernels: Fix bug in Serial specialization of spmv Will only hit spmvs with a beta of exactly -1 using the Serial backend. * Ifpack2: Add single kernel for diagonal extraction, L1 and small entry fix * Disable tests in the UVM Off build * MueLu: correct for variation due to roundoff in Convex Hulls * Ifpack2 Relaxation: Add missing typedefs Co-authored-by: Curtis C. Ober <ccober@sandia.gov> Co-authored-by: Brian Kelley <bmkelle@sandia.gov> Co-authored-by: iyamaza <iyamaza@sandia.gov> Co-authored-by: Christian Glusa <caglusa@sandia.gov> Co-authored-by: Samuel Browne <sebrown@sandia.gov> Co-authored-by: Evan Harvey <57234914+e10harvey@users.noreply.github.com> Co-authored-by: Evan Harvey <eharvey@sandia.gov> Co-authored-by: Nathan Ellingwood <ndellin@sandia.gov> Co-authored-by: trilinos-autotester <trilinos@sandia.gov> Co-authored-by: Heidi K. Thornquist <hkthorn@sandia.gov> Co-authored-by: Christian Glusa <cgcgcg@users.noreply.github.com> Co-authored-by: Jonathan Hu <jhu@sandia.gov> Co-authored-by: gsjaardema <gsjaardema@gmail.com> Co-authored-by: Chris Siefert <csiefer@sandia.gov> Co-authored-by: Roscoe A. Bartlett <rabartl@sandia.gov> Co-authored-by: Peter Ohm <pohm@sandia.gov> Co-authored-by: Paul Wolfenbarger <prwolfe@sandia.gov> Co-authored-by: Alan Williams <william@sandia.gov> Co-authored-by: iyamazaki <ic.yamazaki@gmail.com> Co-authored-by: Nate Roberts <nvrober@sandia.gov> Co-authored-by: James J. Elliott <jjellio@sandia.gov> Co-authored-by: Jennifer Loe <jloe@sandia.gov> Co-authored-by: Henry Swantner <HRSwant@sandia.gov> Co-authored-by: Christian Trott <crtrott@sandia.gov> Co-authored-by: William McLendon <wcmclen@sandia.gov> Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov> Co-authored-by: Geoff Danielson <gcdanie@sandia.gov> Co-authored-by: Timothy A. Smith <tasmit@sandia.gov> Co-authored-by: James M. Willenbring <jmwille@sandia.gov> Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de> Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com> Co-authored-by: James Elliott <jjellio@users.noreply.github.com>
…ement (trilinos#8821) * Tpetra: add new user-friendly MV view access Also add new "owningView_" DualView member that refers to the actual original DV (not a subview of anything else). This is the DualView to sync in order to maintain consistency regardless of how MultiVectors alias each other. 4 new view accessor functions: getLocalView[Host|Device][Non]Const() - Respect constness - Manage syncs and modifies for the user - Prevent taking out a view in one space while any view in the other space is live. - Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just have the reference count checking added (no sync/modify). This has no effect for HostSpace or CudaUVMSpace since those host mirrors match the device views. * Tpetra - fix MV test 14. * Tpetra - fix item 17 * Tpetra - fix item 20 * Tpetra - fix item 23 * Tpetra - fix item 28 * Tpetra - fix item 29 * Tpetra - fix item 35 * Tpetra - workaround for item 30 * Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd) * Tpetra: fix MV [un]pack to respect host/device refcounts * fix nonconst in Bug7745 * Tpetra: stashing * Tpetra - issue 354 fix * Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views * Tpetra: remove static_assert, fix getLocalView() ret type Remove bad static_assert that tripped for Cuda/CudaUVMSpace build. Correct MultiVector::getLocalView() return type to be exactly consistent with DualView::view(). * tpetra: fixed error in MultiVector pack that caused failures with UVM=ON * tpetra: Fix for FEMultivector -- rather than take the subview of a DualView and create a new vector with it, use the MultiVector constructor that gets "offset" views of a vector (in which @brian-kelley has the owningView_ working correctly). While I was at it, I added a swap of the owningView_ to the MultiVector swap() function. * Tpetra: Fixing ImportExport/Issue3968: The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking * tpetra: fix to work without UVM * tpetra: changed getLocalViewHost/Device to new Const/NonConst versions as appropriate. trilinos#8591 Did not change getLocalView as the Const/NonConst versions of getLocalView do not exist yet Did not change MV_reduce_strided to avoid creating conflicts for @brian-kelley * tpetra: change getLocalViewHost to appropriate Const/NonConst version trilinos#8591 * Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions * Tpetra: More getLocalView mods * Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions * Tpetra: Fixing scaleBlockDiagonal signature as per Brian * Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF * Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging * Tpetra - missing getlocal view "device" * Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly Make WithLocalAccess use these tags instead of internal Details:: ones. These will also be used for the new MultiVector view access interface. * moving from getLocalView... to getLocalView...(Tpetra::Accesspattern) * Tpetra - get1dview logic change * Tpetra, WIP: using new tagged view access * Tpetra: use new interface for all MV getLocalView * tpetra: removed unneeded include file * Tpetra: Tags! * Tpetra: Tags! * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing more tests * Tpetra: Fixing tests * Tpetra: Fixing tests * Tpetra: Fixing tests * tpetra: copied implementation of getLocalViewHost and getLocalViewDevice from templated getLocalView, as the getLocalView version does not work. This commit may be temporary, but it allows us to make progress on other bugs while someone figures out the template-fu. Sorry for the debugging statements; we'll get rid of those eventually. * adding localview tests * tpetra: getLocalView<template> now works. cleaned up my obnoxious print statements kept Host and Device implementations that do NOT use getLocalView. * tpetra: added Tpetra::Access to many getLocalView<> instances Tests still pass with UVM=ON. * Tpetra: Removing the dreaded parantheses from the Access tags * Manually intercept UVM allocations, throw exception Effectively makes it impossible for any UVM allocations to exist (except for Stokhos, which calls cudaMallocManaged directly) * Tpetra: Deprecate old getLocalView functions * Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON * tpetra: changed getLocalView to use access tags and getLocalViewDevice * tpetra: added access tags to getLocalView(); fixed scope of some pointers * xpetra: fixes to allow compilation * WIP: deprecate getLocalBlock and start adding tagged overloads * Tpetra: rewrite allReduceView to work with non-UVM allReduceView had one bug and one sub-optimal thing: - Tried to make a view copy with both layout and device different - Kokkos can't do that in a single deep_copy - If a LayoutStride -> contiguous copy needed to be made, it always used LayoutLeft. If one of the input/output views was LayoutStride and the other was LayoutRight, they would both be copied to LayoutLeft. Now, use LayoutRight in this case. Some utilities to help manage layouts and MPI + Kokkos views in general are in the new file temporaryViewUtils.hpp: layout unification, making a contiguous view, and making an MPI-safe view. In the future these can be used to clean up idot and iallreduce without losing efficiency. * Tpetra: Block MultiVector correctly uses getLocalView; removed stored pointer * fix host device type for const_little_host_vec_type * tpetra: clean up of BlockMultiVector fixes * Tpetra: deprecated held pointer mvData_ * tpetra: removed modifies without syncs; fixed MueLu tests * Tpetra - removing sync in ScaleAndAssign test * Tpetra - unit test is okay without modify and sync flags * Tpetra - test passes without modify and sync operations * Tpetra - remove unnecessary sync modify clear state flags * Tpetra - remove multi vector sync/modify/ things * Tpetra - remove sync modify things in other places * Tpetra: remove withLocalAccess, for_each, transform The new MV::getLocalView interface is a simpler substitute for these. * Issue 8391. Switched to C++17 standard for GCC 8.3 build. * FROSch: Convert enum NullSpaceType to scoped enum By converting the enum to an enum class NullSpaceType, one is forced to use the enum class and cannot replace it with integers anymore. This guarantees, that the expressive enum class is used in implementations rather than the implicitly encoded integers. * Patch in KokkosKernels trilinos#872 (fix trilinos#8727, TeamPolicy team size too large in sort_crs_*) Adds the KokkosKernels unit test that replicated this issue. * MueLu: Adding Aggregate size percentiles to AggregateQuality * Moved Tpetra CRS GS into Ifpack2 Relaxation * Moved BlockCrs GS functionality into Relaxation * Enabled new local GS code for CRS * Reduce redundant code in CRS (GS/SGS use same fn) * Using refactored block CRS local apply, unify GS/SGS * More refactoring to get rid of redundant functions * Added required syncs/modifies for vectors * Removed unneeded !constantStride paths * Use cached MV to replace getColumnMapMV from CrsMatrix * Ifpack2: remove unneeded includes * Ifpack2: undo some find-and-replace in comments Undoing some "Node" -> "node_type" * MueLu: undo CMake change, should be its own PR * MueLu: in configure, print out missing ETI setting During configure, MueLu prints out the type combinations to ETI. Add <complex, int, long long> to this, since it was missing. * tpetra: treat WriteOnly of subviews as ReadOnly. * Ifpack2: in RBILUK, use tagged BMV::getLocalBlock * Tpetra: add comment with caveat on BMV::getLocalBlock(i, j, WriteOnly) * tpetra: separated BugTests.cpp into separate test files so that we can disable them separately (since they exercise different classes). * Ifpack2: update BMV getLocalBlock calls to use tagged access, and not use manual sync/modify (which has been removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass. * more test changes * mv localview tests * wrapped up 6 tests for new behaviors * tpetra: scoping fix for Bug7234.cpp; more output from getLocalView* when error occurs, as in parallel runs, throw messages weren't always printed (e.g., from doExport when only 3/4 processors failed) * Tpetra: add MV::aliases(const MV& other) This allows a user to see if two MVs overlap, without actually getting the local views and possibly hitting the reference count checker. * Ifpack2: const correctness, use new getLocalView - Throughout Ifpack2, remove manual sync/modify and calls to deprecated getLocalView. Use tagged getLocalView instead. - In BlockRelaxation and the Containers, change interfaces to use const on views and multivectors that aren't actually modified * Tpetra: fix one MV LocalView test, comment out another We will make sure fix is OK, then uncomment and fix the other * tpetra: enable some Tpetra tests without UVM * tpetra: fix test for non-Cuda builds * Ifpack2: fix more constness of apply vectors * Kokkos: allow CudaUVMSpace::allocate again Roll back change that made CudaUVMSpace::allocate throw when UVM was not the default memory space for Cuda. * tpetra: changes needed to build with DEPRECATED_CODE=OFF trilinos#8821 * fix remaining test * Tpetra - fix for nox failure * Thyra: added missing fences to euclidean apply operations used in MvTimesMatAddMv; the fences resolve test failures with CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo Tpetra: the fences above provide a more surgical fix to the test errors seen in trilinos#8821; this commit removes fences from getLocalView*(ReadOnly). @kyungjoo-kim Belos: preventive fence added with @hkthorn's blessing to mimic those in Thyra. * tpetra: added fence between device kernels and retrieving blocks on host trilinos#8821 * Ifpack2: Minor fix * DualView: make fencing behavior in sync consistent sync<Device>() does extra exec space fences if the dev/host memory spaces are the same. This was missing in sync_host/sync_device, so this adds it there. Makes all Ifpack2 tests for UVM without launch blocking. * tpetra: exercise the Teuchos-based interfaces, too * changed access control from WriteOnly to OverwriteAll because semantics mean things * WIP: fixing idot for MV dualview refactor And some udpates to ifpack2 and amesos2 about that. Working around Kokkos issue trilinos#3850 where the templated getLocalView was used. * WIP: idot/iallreduce cleanup * Tpetra: finish idot/iallreduce refactor * Fixed iallreduce test for non-uvm device * Belos: use new Tpetra MV view interface * Cleanup * Remove extra dualview sync fences * Ifpack2 passes without launch blocking except RBILUK. * Ifpack2: add temporary fence in RBILUK for BlockCrs Later it should be possible to replace this fence with a refactored DualView interface to BlockCrs. * Tpetra: add a global reduce to a test so it will fail when only one proc is failing * Tpetra: fix some typos in a Map unit test * Tpetra: remove deprecated sync/modify calls from a unit test * Ifpack2: fix impl_scalar/scalar mismatch * Tpetra: remove/update remaining mentions of Gauss-Seidel * Tpetra: fix iallreduce for builds without MPI * Ifpack2: revert commenting out try/catch Was causing unused var warning * Ifpack2: Fixing vector mode mistake * tpetra, ifpack2: fixing several access mode errors * Tpetra: use new MV view interface in Bug8794 test * Amesos2: revert using tagged Tpetra MV getLocalView for some reason, using ReadOnly tag to access MV view in TpetraMultivecAdapter caused solve solution to not get copied back to the Tpetra multivector. This is surprising because the views were just used as the source for a Kokkos deep copy, and this caused BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews are trivial, and all kernels are synchronous) * Ifpack2: add back tag clobbered by merge * kokkos: patch from kokkos/kokkos#3857 * comment out all the instances of TPETRA_DEPRECATED (trilinos#9023) * MueLu: add fence for recent intrepid2 changes Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking. * Tpetra: restore MV_reduce_strided test. Key: use the MV (map, dualview, orig_dualview) constructor instead of the (map, dualview) constructor. If $dualview is noncontiguous, the first one lets you pass orig_dualview as the contiguous super-view containing dualview, and orig_dualview can be sync'd without problems. Also modify TempView::toLayout() to test span_is_contiguous, rather than assuming that (Layout != LayoutStride) implies contiguous. * tpetra: Removed deprecated sync_device calls * Tpetra: Remove some MultiVector that were checking modification state (trilinos#9032) * Tpetra: Deprecate need_sync* in MultiVector * Tpetra: for now, we won't deprecate need_sync_host/device * tpetra: removed instantiations of removed tests * Tpetra: don't use CudaSpace in nonblocking collectives OpenMPI does not support Cuda device buffers for nonblocking collectives like MPI_Iallreduce, even with a Cuda-aware installation. * Fix old typo in Ifpack2_UnitTestBlockRelaxation * Fix access tag: OverwriteAll -> ReadWrite Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later. * Undo bad DualView merge Co-authored-by: Brian Kelley <bmkelle@sandia.gov> Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov> Co-authored-by: Chris Siefert <csiefer@sandia.gov> Co-authored-by: Geoff Danielson <gcdanie@sandia.gov> Co-authored-by: Timothy A. Smith <tasmit@sandia.gov> Co-authored-by: James M. Willenbring <jmwille@sandia.gov> Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de> Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
Work Tracking Issue
@trilinos/tpetra
Branch for working on MultiVector DualView management refactor: TpetraDualViewRefactor.
Updated 2-3-21:
I replaced the new functions we just added (e.g. getLocalViewDeviceConst, getLocalViewHostNonConst) with permament tagged versions:
Non-templated versions like
getLocalViewHost(tag)
too.All getLocalView calls in MultiVector and core/test/MultiVector have been replaced with this new interface, but this caused new failures in tests.
MultiVector
FEMultiVector
BlockMultiVector
Import/Export
The text was updated successfully, but these errors were encountered: