Releases: oneapi-src/oneCCL
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.14
What's New:
- Optimizations on key-value store support to scale up to 3000 nodes
- New APIs for Allgather, Broadcast and group API calls
- Performance Optimizations for scaleup for Allgather, Allreduce, and Reduce-scatter for scaleup and scaleout
- Performance Optimizations for CPU single node
- Optimizations to reuse Level Zero events.
- Change of the default mechanism for IPC exchange to pidfd
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13.1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13
What's New:
- Optimizations to limit the memory consumed by oneCCL
- Optimizations to limit the number of file descriptors maintained opened by oneCCL.
- Align the support for in-place for the Allgatherv and Reduce-scatter collectives to follow the same behavior as NCCL.
- In particular, the Allgatherv collective is in place when:
- send_buff == recv_buff + rank_offset, where rank_offset = sum (recv_counts[i]), for all I<rank.
- Reduce-scatter is in-place when recv_buff == send_buff + rank *recv_count.
- When using the environment variable CCL_WORKER_AFFINITY, oneCCL enforces the requirement that the length of the list should be equal to the number of workers.
- Bug fixes.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.12
What's New
- Performance improvements for scaleup for all message sizes for AllReduce, Allgather, and Reduce_Scatter.
- Optimizations also include small message sizes that appear in inference apps.
- Performance improvements for scaleout for Allreduce, Reduce, Allgather, and Reduce_Scatter.
- Optimized memory usage of oneCCL.
- Support for PMIx 4.2.6.
- Bug fixes.
Removals
- oneCCL 2021.12 removes support for PMIx 4.2.2
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.2
This update provides bug fixes to maintain driver compatibility for Intel® Data Center GPU Max Series.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.1
This update addresses stability issues with distributed Training and Inference workloads on Intel® Data Center GPU Max Series.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11
-
Added point to point blocking communication operations for send and receive.
-
Performance optimizations for Reduce-Scatter.
-
Improved profiling with Intel® Instrumentation and Tracing Technology (ITT) profiling level.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.10
- Improved scaling efficiency of the Scaleup algorithms for ReduceScatter
- Optimized performance of oneCCL scaleup collectives by utilizing the new embedded Data Streaming Accelerator in Intel® 4th Generation Xeon Scalable Processors (formerly code-named Sapphire Rapids)
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.9
• Optimizations across the board including improved scaling efficiency of the Scaleup algorithms for Alltoall and Allgather
• Add collective selection for scaleout algorithm for device (GPU) buffers
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.8
• Provides optimized performance for Intel® Data Center GPU Max Series utilizing oneCCL.
• Enables support for Allreduce, Allgather, Reduce, and Alltoall connectivity for GPUs on the same node.