-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offload reduction operations to accelerator devices #12318
base: main
Are you sure you want to change the base?
Commits on Nov 7, 2023
-
Initial draft of CUDA device support for ops
Signed-off-by: Joseph Schuchart <jschuchart@leconte.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 35ff1da - Browse repository at this point
Copy the full SHA 35ff1daView commit details -
First working version of CUDA op support
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for b7e6f89 - Browse repository at this point
Copy the full SHA b7e6f89View commit details -
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 164388a - Browse repository at this point
Copy the full SHA 164388aView commit details -
Fix minor bugs to get osu_allreduce working
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for d8110ac - Browse repository at this point
Copy the full SHA d8110acView commit details -
cuMemAllocAsync is supported since CUDA 11.2.0
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for f609127 - Browse repository at this point
Copy the full SHA f609127View commit details -
coll/base/allreduce: Condition device allocation on op/dtype support
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 8ae3dac - Browse repository at this point
Copy the full SHA 8ae3dacView commit details -
Make sure the device op callbacks are zero-initialized
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 655948f - Browse repository at this point
Copy the full SHA 655948fView commit details -
Be more graceful when creating a context and stream
Signed-off-by: Joseph Schuchart <jschuchart@xsdk.icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 7cdc828 - Browse repository at this point
Copy the full SHA 7cdc828View commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for bdb16a1 - Browse repository at this point
Copy the full SHA bdb16a1View commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 5934f43 - Browse repository at this point
Copy the full SHA 5934f43View commit details -
Add CUDA stream-based allocator and memory pools
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for c2c3d0e - Browse repository at this point
Copy the full SHA c2c3d0eView commit details -
Don't memset the CUDA op component, we need the version
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 5df449c - Browse repository at this point
Copy the full SHA 5df449cView commit details -
Set the memory pool release threshold
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 812d068 - Browse repository at this point
Copy the full SHA 812d068View commit details -
Implement device-compatible allocator to cache coll temporaries
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for a688c84 - Browse repository at this point
Copy the full SHA a688c84View commit details -
Fix devicebucket allocator for larger sizes
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for bbd362d - Browse repository at this point
Copy the full SHA bbd362dView commit details -
Fix the RDMA fallback protocol selection.
If the target process is unable to execute an RDMA operation it instructs the origin to change the communication protocol. When this happen theorigin must be informed to cancel all pending RDMA operations, and release the rdma_frag. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 1fd6636 - Browse repository at this point
Copy the full SHA 1fd6636View commit details -
Stream-based reduction and ddt copy and 3buff cuda kernels, adopted f…
…or allreduce recursive doubling Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for f2f0f2d - Browse repository at this point
Copy the full SHA f2f0f2dView commit details -
Remove extra copies from allreduce redscat and ring
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 8f5b503 - Browse repository at this point
Copy the full SHA 8f5b503View commit details -
Allow ops and memcpy on managed memory from the host
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 1c68d17 - Browse repository at this point
Copy the full SHA 1c68d17View commit details -
reduce_local: add support for device memory
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 70dde0f - Browse repository at this point
Copy the full SHA 70dde0fView commit details -
Draft of ompi_op_select_device
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for e603bcc - Browse repository at this point
Copy the full SHA e603bccView commit details -
Second draft of ompi_op_select_device
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 60dd446 - Browse repository at this point
Copy the full SHA 60dd446View commit details -
Fix undefined symbols in cuda op component
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for c485ecf - Browse repository at this point
Copy the full SHA c485ecfView commit details -
Fix off-by-one error in device-bucket allocator
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 793863c - Browse repository at this point
Copy the full SHA 793863cView commit details -
Heuristic to select op device based on element count
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for d2e8677 - Browse repository at this point
Copy the full SHA d2e8677View commit details -
init op_rocm, not compilable yet
Signed-off-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for cd7e578 - Browse repository at this point
Copy the full SHA cd7e578View commit details -
implemented funcs in accelerator_rocm modules
Signed-off-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 2ccaa87 - Browse repository at this point
Copy the full SHA 2ccaa87View commit details -
add -I include path to Makefile
Signed-off-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for a6f1cce - Browse repository at this point
Copy the full SHA a6f1cceView commit details -
added rocm codes into test example
Signed-off-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for ce0b88d - Browse repository at this point
Copy the full SHA ce0b88dView commit details -
Signed-off-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for ad420fe - Browse repository at this point
Copy the full SHA ad420feView commit details -
Make headers in reduce_local better parsable
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for c3c3287 - Browse repository at this point
Copy the full SHA c3c3287View commit details -
CUDA: disable internal memory pool (seems broken)
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 9674aae - Browse repository at this point
Copy the full SHA 9674aaeView commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 628c0f1 - Browse repository at this point
Copy the full SHA 628c0f1View commit details -
Reduce_local: set hip device during init
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 251dac4 - Browse repository at this point
Copy the full SHA 251dac4View commit details -
CUDA accelerator: fix compiler warnings
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 7589d17 - Browse repository at this point
Copy the full SHA 7589d17View commit details -
Device op: pass device to lower-level op to avoid recurring queries
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for ead6847 - Browse repository at this point
Copy the full SHA ead6847View commit details -
CUDA/ROCm: Fix vectorized ops and rocm integration
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for ee31b60 - Browse repository at this point
Copy the full SHA ee31b60View commit details -
Reduce_local: use OPAL defines to detect device support
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 9ab499a - Browse repository at this point
Copy the full SHA 9ab499aView commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for dbd855d - Browse repository at this point
Copy the full SHA dbd855dView commit details -
Reduce: add vectors to cuda implementation
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 02120c9 - Browse repository at this point
Copy the full SHA 02120c9View commit details -
Allreduce: cleanup and minor fixes
Replace ompi_op_reduce with ompi_op_reduce_stream(..., NULL) to avoid repeated checking for locality in ompi_op_reduce Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 7cdbe24 - Browse repository at this point
Copy the full SHA 7cdbe24View commit details -
Add MCA op_[cuda|rocm]_max_num_[blocks|threads]
These variables allow users to limit the maximum number of blocks and threads per block in the reduction kernels. The implementation will fall back to the device limit if lower. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for c7fe5f6 - Browse repository at this point
Copy the full SHA c7fe5f6View commit details -
Fix the generation of "unsigned char" ops.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 42bd424 - Browse repository at this point
Copy the full SHA 42bd424View commit details -
We need CXX17 for the CUDA ops.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 8e3d042 - Browse repository at this point
Copy the full SHA 8e3d042View commit details -
ROCM: add vectorization of some basic ops
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 7524f99 - Browse repository at this point
Copy the full SHA 7524f99View commit details -
Device allocators: correctly handle non-zero ID single accelerator
The accelerator component may report the availability of a single accelerator whose ID is not zero. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for cfe8a5a - Browse repository at this point
Copy the full SHA cfe8a5aView commit details -
CUDA op: consistently name unsigned_long functions as ulong
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 3bc7676 - Browse repository at this point
Copy the full SHA 3bc7676View commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 9c1da7e - Browse repository at this point
Copy the full SHA 9c1da7eView commit details -
Reduce_local test: correctly test for OPAL_CUDA_SUPPORT and OPAL_ROCM…
…_SUPPORT These macros are defined to either 1 or 0 Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for a20f671 - Browse repository at this point
Copy the full SHA a20f671View commit details -
More unsigned_long -> ulong fixes in CUDA and ROCm op
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 97338db - Browse repository at this point
Copy the full SHA 97338dbView commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 541b8a0 - Browse repository at this point
Copy the full SHA 541b8a0View commit details -
Reduce_local: access only host-side memory in error message
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 8cb2feb - Browse repository at this point
Copy the full SHA 8cb2febView commit details -
Make sure CUDA accelerator is initialized before querying number of d…
…evices Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 2996ba0 - Browse repository at this point
Copy the full SHA 2996ba0View commit details -
Accelerator: provide peak bandwidth estimate
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 246003f - Browse repository at this point
Copy the full SHA 246003fView commit details -
accelerator/rocm: regular memory behaves like unified memory
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 6601484 - Browse repository at this point
Copy the full SHA 6601484View commit details -
ROCM: add missing FUNC_FUNC_FN macro
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for d0fe9a2 - Browse repository at this point
Copy the full SHA d0fe9a2View commit details -
opal_datatype_accelerator_memcpy: determine device copy type
We know where source and target buffers are located, so pass the right transfer direction to the accelerator memcpy call. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 63b64a0 - Browse repository at this point
Copy the full SHA 63b64a0View commit details -
accelerator rocm: fix global memcpy stream variable
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 5a29e13 - Browse repository at this point
Copy the full SHA 5a29e13View commit details -
Thread base: fix missing include file
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 5c7c7a1 - Browse repository at this point
Copy the full SHA 5c7c7a1View commit details -
Accelerator: Remove debug output
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 76f00c4 - Browse repository at this point
Copy the full SHA 76f00c4View commit details -
Allreduce: don't copy inputs if data can be accessed from the host
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 56bcfee - Browse repository at this point
Copy the full SHA 56bcfeeView commit details -
Be more careful when releasing temporary receive buffers
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for a1f089e - Browse repository at this point
Copy the full SHA a1f089eView commit details -
Remove debug output and dead code
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 33616e6 - Browse repository at this point
Copy the full SHA 33616e6View commit details -
Bump max devicebucket allocator max size to 1GB
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 9da8b54 - Browse repository at this point
Copy the full SHA 9da8b54View commit details -
accelerator/cuda: fix error message
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 93ded5e - Browse repository at this point
Copy the full SHA 93ded5eView commit details -
CUDA: Select compute capability 52 by default
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 182e6fa - Browse repository at this point
Copy the full SHA 182e6faView commit details -
Sqash const correctness warnings
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for e5eb45f - Browse repository at this point
Copy the full SHA e5eb45fView commit details -
Squash warnings about mismatched function pointer types
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 14a5372 - Browse repository at this point
Copy the full SHA 14a5372View commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 1f63809 - Browse repository at this point
Copy the full SHA 1f63809View commit details -
Replace fprintf with show_help
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 3d9f33a - Browse repository at this point
Copy the full SHA 3d9f33aView commit details -
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for c878c4f - Browse repository at this point
Copy the full SHA c878c4fView commit details -
Clean up cuda and rocm op codes
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 1c6667d - Browse repository at this point
Copy the full SHA 1c6667dView commit details -
Minor tweak to CUDA op configury
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for 7bb4b95 - Browse repository at this point
Copy the full SHA 7bb4b95View commit details
Commits on Nov 8, 2023
-
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Configuration menu - View commit details
-
Copy full SHA for d1382c3 - Browse repository at this point
Copy the full SHA d1382c3View commit details