Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for single memory space and other fixes #248

Merged
merged 8 commits into from
Mar 9, 2024

Conversation

adayton1
Copy link
Member

@adayton1 adayton1 commented Mar 9, 2024

  • Add support for single memory space
  • Windows shared library build fixes
  • Support for unsigned and 64 bit integers in algorithms and scans
  • Optimizations to reduce host side memory touches

@adayton1
Copy link
Member Author

adayton1 commented Mar 9, 2024

I'm leaving out the gpu sim mode fixes until after the release (I think there are some updates to CHAI needed as well).

Copy link
Collaborator

@liu15 liu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for upstreaming all of this!

@adayton1 adayton1 merged commit 597c42d into develop Mar 9, 2024
12 checks passed
@adayton1 adayton1 deleted the feature/dayton8/miscellaneous branch March 9, 2024 01:20
adayton1 added a commit that referenced this pull request Mar 11, 2024
* Add care::min and care::max, remove CARE_MIN and CARE_MAX

* Remove reference

* I guess I should write in C++

* Use ref in signature

* Revert "Use ref in signature"

This reverts commit a94f14e.

* Fixes for loop fuser

* Add abs

* Add FUSIBLE_LOOP_COUNTS_TO_OFFSETS_SCAN_PHASE

* Fix macros for loop fuser disabled

* Use refs

* Change signatures to match STL

* Remove conditional dependence on GPU_ACTIVE

* Add Loop Fuser support for host_device_map

* Remove header

* Add test

* Make input array const

* Add non-const overload of copy_n

* Add explicit instantiations for care::copy_n

* Fix use of CHAI_GPU_SIM_MODE (only affects GPU_SIM mode with gpu compiler)

* Fix for v0.7.0 refactoring

* Remove conditional dependence on OPENMP_ACTIVE

* Add PAUSE/RESUME capability for LoopFuser

* Fix warning during resume

* Missed one

* Windows fix

* Add missing macro

* Host device map fixes: implement host iterator_at and unify map loops; reset size in host clear(); fix loop fuser map macros for host; remove START argument for MAP macros

* Remove mutable

* Fix bounds checking for slices

* Fix comment for loop fuser

* Introduce CARE_PARALLEL_DEVICE

* Fix typo

* Use perfect forwarding for lambdas

* Add check for stale host data view

* Undo change from another branch

* Clarify description of CMake option

* Remove obsolete function

* Simplify host_device_ptr

* setup up host_device_ptr to be either _host_device_ptr or host_device_race_detection_ptr

* refactor race detection to use an accessor pattern

* fix build in care

* care builds again after initial stab at actual race detection.

* add accessor.

* add race detection test to TestForAll

* debugging test case.

* basic test case behaves appropriately

* misceallaneous race condition fixes and false positive workarounds.

* Adds race detection instrumentation to CARE, addresses several race conditions throughout the code.

* have default accessor be configurable, previous DefaultAccesor is no NoOpAccessor

* configure behavior of setting the thread id in a CARE_CHECKED_PARALLEL_LOOP

* support non-default Accessor for sortArray and uniqArray

* race condition test only if loop configured to support it.

* build fixes after adding uniqArray support.

* address reviewer concerns in care.

* Update CARE to C++14

* Remove illegal template argument

* Fix warnings about unused parameter

* Fix unused variable warning

* Fix compiler warning

* Try to fix ambiguous test

* Attempt to fix std being passed multiple times with different values

* Use c++11 supported version of remove_const

* Add missing template argument for GPU algorithms

* Pick specific CMake version

* Allow changes to blt submodule

* Update to BLT v0.5.2

* Remove gcc 4.9.3 build from CI

* Accessor fixes for GPU builds, use blt::hip instead of hip target in builds, make lop fuser flush length configurable.

* Update LoopFuser.cpp

* Clean up gitignore

* Update CAMP to v2022.10.1

* Update Umpire to v2022.10.0

* Update RAJA to v2022.10.5

* Update CHAI to v2022.10.0

* Default to c++14

* Update to CMake 3.20

* Fix up some changed options in tpls

* Fix up some changed options in tpls

* Update CARE spack spec to build with c++14

* Add cpp14 variant to care spack package

* Update submodule build configurations

* Rename blueos host config to be more generic

* Fixes for build using submodules on rzansel

* Remove gcc 4.9.3 implicit link directories

* Get rid of ambiguous operator== overload and just rely on ManagedArray operator==

* Remove accidentally committed file

* Add default array views

* Build fixes

* Simplify benchmark

* Update toss3 clang host config

* Invert layout

* Decide on convention for array views

* Update benchmark

* Update host configs

* Slight wording clarifications

* split out new capabilities into the right library locations.

* address clang-query failures.

* build fix, reallocate the correct reference.

* add a concept of a ZERO_COPY memory space and use it instead of PINNED in the SortFuser.

* address reviewer comments.

* build fixes after addressing reviewer comments.

* respond to more reviewer comments.

* fix typo.

* add GPU_MEMORY_IS_ACCESSIBLE_ON_GPU CMake option.

* Fix sort fuser seg fault

* Fix use of CARE_ENABLE_GPU_SIMULATION_MODE

* Win32 Build Fix

* Move toss3 host config to more specific location

* Add toss4 host config

* Move blueos host config to more specific location

* plugins

* Delete BenchmarkPlugin.cpp

* undoing accidental changes

* fix RAJAPlugin.cpp

* resolved some issues

* small tweak

* fixed formatting

* trying to revert 2 files and deleted RAJAPlugin class

* replaced calls to RAJAPlugin

* fixing build errors hopefully

* fix vector error

* Allow failure for builds

* new class for plugin data

* manually set chai execution space for 2d

* Update to latest radiuss-spack-configs

* Update to latest uberenv

* Make sure radiuss-ci is at latest

* Update to BLT v0.5.3

* Update uberenv configuration

* Allow remaining builds to fail

* Update BLT version

* Start with one job on Ruby

* fixed plugindata class

* Update build_and_test.sh script

* Add debug option to uberenv config

* Update spack packages

* Remove custom build on ruby for now

* Add missing comma to uberenv config file

* Spack now requires a repo.yaml file

* Rearrange to match expected spack package layout

* remove unnecessary includes

* Point to newer default pipelines

* Update CARE spack package

* Fix syntax errors

* Add debug prints

* output more info

* Try again

* Use specific core counts from RAJA

* Clean up some debugging junk

* More clean up

* Try to get spack build logs as artifacts

* fixed formatting and small issues

* Move backward to uberenv v1.0.0

* Remove debug info and override CUDA architecture in lassen job

* registration function

* Increase time limit on ruby and allow failure on one lassen job

* Consistent time limits

* Shared allocation needs to be longer than job allocation

* fixed reg function

* Increase CI time limit for Ruby

* resolved comments

* Delete BenchmarkForall.cpp

* removed streams stuff

* restored orignal BenchmarkForall.cpp

* Care->CARE

* resolved more comments

* resolved comments and errors

* fixes profileplugin registration

* added comments and formatting

* fixed HIP warning

* Revert to BLT v0.5.2

* Disallow BLT v0.5.3 in CARE spack package

* tweaked plugindata description

* actionmap alias

* forgot one actionmap

* move actionmap inside care namespace

* fix indentation

* try to fix printf errors

* fixed warnings in debugplugin and forall

* maybe silence profileplugin warnings

* forall unused parameter warning

* Pass gcc-toolchain in all cases

* Make BenchmarkForall more repeatable (#229)

* Make BenchmarkForall more repeatable

* Update to BLT v0.5.3 (#231)

* Invalidate key and value arrays after sort on CPU (#234)

* Use compiler generated functions in host_ptr and local_ptr (#236)

* Use compiler generated special member functions in host_ptr
* Use compiler generated special member functions in local_ptr

* Use public instead of private member (#237)

Fix after #236

* Fix bug in KeyValueSorter (#238)

* [woptim] Update radiuss-shared-ci to new release (with radiuss-spack-configs) (#235)

* Update radiuss-shared-ci to new release (with radiuss-spack-configs)

* Fix old naming

* Update job override

* Effectively use ibm clang

* Do not prevent the mirroring of release tags

---------

Co-authored-by: Alan Dayton <6393677+adayton1@users.noreply.github.com>

* Revamp CMake (#244)

* Bump minimum required CMake version to 3.18
* Improve dependency handling
* Improve exported targets
* Modularize CMake code

* [Woptim] Shared ci 2023.12.0 (#245)

Update Shared CI to 2023-12-0
Add poodle machine
Tweak allocations
Update build_and_test script (sync with RAJA).

---------

Co-authored-by: Adrien M. BERNEDE <51493078+adrienbernede@users.noreply.github.com>

* Update dependencies (#246)

* Update BLT to v0.6.1
* Update CAMP to v2024.02.0
* Update Umpire to v2024.02.0
* Update RAJA to v2024.02.0
* Update CHAI to v2024.02.0
* Update Spack to develop-2024-01-21
* Update radiuss-spack-configs to c585417
* Use local spack package for CARE and then radiuss-spack-configs spack packages for dependencies
* Fix up CARE install
* Use new way of exporting BLT logic
* Update local CARE spack package

* Update copyright (#247)

* Update copyright to 2024
* Clean up license text
* Remove outdated CMake config file

* Add ArrayDup from raw pointer (#242)

* Add ArrayDup from raw pointer

---------

Co-authored-by: Alan Dayton <dayton8@llnl.gov>

* Support for single memory space and other fixes (#248)

* Add support for single memory space
* Windows shared library build fixes
* Support for unsigned and 64 bit integers in algorithms and scans
* Optimizations to reduce host side memory touches

* Update version number and readme

* Update release notes

---------

Co-authored-by: Benjamin T. Liu <liu15@llnl.gov>
Co-authored-by: Peter B. Robinson <robinson96@llnl.gov>
Co-authored-by: Neela Kausik <kausik1@llnl.gov>
Co-authored-by: neelakausik <110871421+neelakausik@users.noreply.github.com>
Co-authored-by: Ben Liu <38140930+liu15@users.noreply.github.com>
Co-authored-by: Adrien Bernede <51493078+adrienbernede@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants