distributed-embeddings is a library for building large embedding based (e.g. recommender) models in Tensorflow 2. It provides a scalable model parallel wrapper that automatically distribute embedding tables to multiple GPUs, as well as efficient embedding operations that cover and extend Tensorflow's embedding functionalities.
distributed_embeddings.dist_model_parallel
is a tool to enable model parallel training by changing only three lines of your script. It can also be used alongside data parallel to form hybrid parallel training. Users can easily experiment large scale embeddings beyond single GPU's memory capacity without complex code to handle cross-worker communication.
distributed_embeddings.Embedding
combines functionalities of tf.keras.layers.Embedding
and tf.nn.embedding_lookup_sparse
under a unified Keras layer API. The backend is designed to achieve high GPU efficiency.
See more details at User Guide
Python 3, CUDA 11 or newer, TensorFlow 2
You can build inside 22.03 or later NGC TF2 image:
docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
After clone this repository, run:
make pip_pkg && pip install artifacts/*.whl
Test installation with:
python -c "import distributed_embeddings"
You can also run Synthetic and DLRM examples.
If you'd like to contribute to the library directly, see the CONTRIBUTING.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.
If you're interested in learning more about how distributed-embeddings works, see documentation.