AWS OFI NCCL v1.8.1
This is a bugfix release that requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).
Bug Fixes:
- Fix an issue with the ID pool's reference counting and allocation
- Improved error propagation for failed NCCL requests, allowing applications to fail early instead of blocking on requests that can never be completed.
The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:
- efa
Checksum (sha512) for the release tarball:
4ee21380176d5a76e4af0233ac44d1d46f92fd34941ecfaa104b7567a16cc84503c0abe59e540d36d79675bb3cc443979ed319f39582e301814d0653ea184508 aws-ofi-nccl-1.8.1-aws.tar.gz