Releases: NVIDIA-Merlin/distributed-embeddings
Releases · NVIDIA-Merlin/distributed-embeddings
v23.06.00 (v1.0)
What’s Changed
New Features
- Added support for row-slicing as a parallel strategy
- Added support for data-parallel as a parallel strategy
- Allow mix-matching data-parallel, table-parallel, row-slicing, and column-slicing. Refer to User Guide for more details.
- Added IntegerLookup layer that supports on-the-fly vocabulary building, on both CPU and GPU
Breaking Changes
- Added NVIDIA cuCollections as submodule for GPU hash map support
- Now support TensorFlow 2.12. Note that this change breaks the build with TF 2.09 and earlier.
Improvements
- Improved package import
Bug Fixes
- fixes input offset overflow due to automatic table concatenating
- fixes potential graph mismatching problems in broadcast
Full Changelog: v23.03.00...v23.06.00
v23.03.00
What’s Changed
New Features
- NVIDIA Hopper™ architecture families support (compute capability 9.0).
- Added support for Keras Model fit api.
- Added support for Horovod callbacks in case of hybrid data/model parallel.
Breaking Changes
- Now support TensorFlow 2.12. Note that this change breaks build with TF 2.09 and earlier.
- Now require horovod version 0.27 or later.
Improvements
- Improved unit tests
Bug Fixes
New Contributors
Full Changelog: v0.3...v23.03.00
v0.3
What’s Changed
New Features
- CUDA 12 support
- Automatic concatenation of multiple embedding tables for greatly improved speed
- Support model parallel with user-defined custom keras layer through
DistributedEmbedding
wrapper
Improvements
- Support cases where number of workers is greater than number of tables.
- For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.
Breaking Changes
- move submodule from CUB to NVIDIA Thrust for better compatibilities
Bug Fixes
- Better error handling in
set_weight()
when weights are not initialized - Better error handling when global batchsize is not divisible by number of workers
Full Changelog: v0.2...v0.3
v0.2
What’s Changed
Breaking Changes
- added new dependency NVIDIA CUB as submodule
New Features
- SparseTensor is supported as embedding input, in addition to Dense and Ragged Tensor.
- Add support and example for keras model.fit() api through custom train_step() function
Improvements
- Improved embedding lookup speed when input is multi-hot with combiner.
- Improved embedding lookup speed when input is one-hot, regardless of its combiner and format(Tensor, SparseTensor or RaggedTensor)
- Add support for data parallel input, cpu embedding and TF native embedding api as options in benchmark
Bug Fixes
- fix build with tensorflow 2.10+
- fix a bug where batch dimension could be None at early stage in graph mode
Full Changelog: v0.1...v0.2