Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Performant large embedding table support #733

Open
3 of 16 tasks
EvenOldridge opened this issue May 5, 2022 · 5 comments
Open
3 of 16 tasks

[RMP] Performant large embedding table support #733

EvenOldridge opened this issue May 5, 2022 · 5 comments
Assignees
Labels

Comments

@EvenOldridge
Copy link
Member

EvenOldridge commented May 5, 2022

Problem:

Goal:

New Functionality

  • Models
    • ...
  • Transformers4Rec
    • ...
  • NVTabular
    • ...
  • Systems
    • ...

Constraints:

##Architectural consideration
NA

Starting Point:

Model Parallel Support

Feature engineering that reduces embedding size

  • Mixed Dimension Embeddings
  • Frequency Capping
  • Frequency Hashing
  • Bloom Embeddings
  • TT-Rec

Reduced Precision Support

  • Sparse Row-wise Optimizers (Facebook Research DLRM)
  • Reduced Precision Optimizers
  • Reduced Embedding Precision

Not storing user embeddings

  • Represent user as item embedding aggregations (YouTube DNN)

Inference Support

  • Hierarchical Parameter Server Support

Serving

Example

@karlhigley
Copy link
Contributor

This looks good! Two questions:

  • How does this relate to Refactor InputBlock models#282 (currently slated to be completed at the end of [RMP] Make losses, metrics, masking, and negative sampling configurable from model.compile() #271)? Asking because @marcromeyn said the input block refactor "would also enable kickstarting the work of integrating model-parallelism for large-embedding tables (for instance through the HugeCTR SOK.)" Wondering to what extent the input block changes depend on the rest of the Models API changes, and if we can pull the input block work forward somehow to unblock whichever parts of model parallel support depend on it.
  • Are there further methods for not storing user embeddings planned here? If the aggregating item embeddings is the main/only one, we might want to capture that in [RMP] Add YouTube DNN ranking model to Merlin Models #279 instead. This looks like a ton of useful stuff that we haven't really captured anywhere before, but that one piece we can probably tackle as part of the YouTube DNN work.

@bschifferer
Copy link
Contributor

As an success criteria, we need to have benchmarks for each of the point above:

  • How does throughput change? (E.g. TF Keras vs. SOK. vs TFDE vs. reduced precision optimizer vs reduced precision embedding)
  • What is the AUC/performance of the model? (E.g. TF Keras vs. SOK. vs TFDE vs. reduced precision optimizer vs reduced precision embedding)

Customer ask us the questions and if we need to answer them, if we provide the functionality. Only if we add run the experiments, we can ensure that the implementation is correct.

@EvenOldridge EvenOldridge removed this from the Merlin 22.12 milestone Nov 2, 2022
@viswa-nvidia viswa-nvidia assigned marcromeyn and unassigned benfred Nov 8, 2022
@viswa-nvidia
Copy link

@marcromeyn , please define this ticket and also create another ticket for SOK

@viswa-nvidia viswa-nvidia transferred this issue from NVIDIA-Merlin/models Nov 15, 2022
@viswa-nvidia viswa-nvidia added this to the Merlin 23.02 milestone Nov 15, 2022
@viswa-nvidia viswa-nvidia removed the epic label Dec 15, 2022
@viswa-nvidia
Copy link

@EvenOldridge , please help to define this ticket

@viswa-nvidia
Copy link

@edknv , please check with HCTR team and confirm milestone

@edknv edknv modified the milestones: Merlin 23.03, Merlin 23.04 Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants