Skip to content

Releases: NVIDIA-Merlin/distributed-embeddings

v23.06.00 (v1.0)

13 Jul 06:14
Compare
Choose a tag to compare

What’s Changed

New Features

  • Added support for row-slicing as a parallel strategy
  • Added support for data-parallel as a parallel strategy
  • Allow mix-matching data-parallel, table-parallel, row-slicing, and column-slicing. Refer to User Guide for more details.
  • Added IntegerLookup layer that supports on-the-fly vocabulary building, on both CPU and GPU

Breaking Changes

  • Added NVIDIA cuCollections as submodule for GPU hash map support
  • Now support TensorFlow 2.12. Note that this change breaks the build with TF 2.09 and earlier.

Improvements

  • Improved package import

Bug Fixes

  • fixes input offset overflow due to automatic table concatenating
  • fixes potential graph mismatching problems in broadcast

Full Changelog: v23.03.00...v23.06.00

v23.03.00

19 Apr 13:13
Compare
Choose a tag to compare

What’s Changed

New Features

  • NVIDIA Hopper™ architecture families support (compute capability 9.0).
  • Added support for Keras Model fit api.
  • Added support for Horovod callbacks in case of hybrid data/model parallel.

Breaking Changes

  • Now support TensorFlow 2.12. Note that this change breaks build with TF 2.09 and earlier.
  • Now require horovod version 0.27 or later.

Improvements

  • Improved unit tests

Bug Fixes

  • Use tf.shape for graph mode support by @edknv in #6

New Contributors

  • @edknv made their first contribution in #6

Full Changelog: v0.3...v23.03.00

v0.3

13 Feb 06:13
Compare
Choose a tag to compare

What’s Changed

New Features

  • CUDA 12 support
  • Automatic concatenation of multiple embedding tables for greatly improved speed
  • Support model parallel with user-defined custom keras layer through DistributedEmbedding wrapper

Improvements

  • Support cases where number of workers is greater than number of tables.
  • For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.

Breaking Changes

  • move submodule from CUB to NVIDIA Thrust for better compatibilities

Bug Fixes

  • Better error handling in set_weight() when weights are not initialized
  • Better error handling when global batchsize is not divisible by number of workers

Full Changelog: v0.2...v0.3

v0.2

09 Feb 08:06
Compare
Choose a tag to compare

What’s Changed

Breaking Changes

New Features

  • SparseTensor is supported as embedding input, in addition to Dense and Ragged Tensor.
  • Add support and example for keras model.fit() api through custom train_step() function

Improvements

  • Improved embedding lookup speed when input is multi-hot with combiner.
  • Improved embedding lookup speed when input is one-hot, regardless of its combiner and format(Tensor, SparseTensor or RaggedTensor)
  • Add support for data parallel input, cpu embedding and TF native embedding api as options in benchmark

Bug Fixes

  • fix build with tensorflow 2.10+
  • fix a bug where batch dimension could be None at early stage in graph mode

Full Changelog: v0.1...v0.2

v0.1

09 Feb 07:40
Compare
Choose a tag to compare
v0.1 Pre-release
Pre-release

Initial release