[RMP] Quick-start for ranking models training pipeline #827

gabrielspmoreira · 2023-02-22T00:06:25Z

Problem:

Merlin provides documentation and a number of example notebooks on how to use tools like NVTabular, Dataloader and Merlin Models. In order to build a pipeline for training and evaluation purposes, a Data Scientist needs to analyze that material, copy-and-paste code snippets demonstrating the API and glue that code together to implement scripts for experimentation and benchmarking.
It might also not be clear to the users the advanced API options featured by Merlin Models that can be mapped as a hyperparameter, and potentially improve models accuracy.

Goal:

This RMP provides a Quick-start for building ranking models training pipelines.
It addresses the ranking models part of this larger RMP NVIDIA-Merlin/models#732, in particular the steps 4-7 of the Data Scientist journey when experimenting with Merlin Models.

The Quick start for ranking is composed by:

Template scripts

Generic template script for preprocessing
Generic template script for training ranking models, exposing the main hyperparameters for ranking models .
It includes support to ranking models like DLRM, DCN-v2, DeepFM, Wide&Deep and multi-task learning (MTL) like MMOE, CGC and PLE.

Documentation

Documentation of the scripts command line arguments
Documentation of best practices learned from our experimentation:
- Hyperparameter tuning: search space, most important hyperparameters and best hparams for TenRec public dataset for STL and MTL models
- Intuitions of API options (building blocks, arguments) that can improve models accuracy

Constraints:

Preprocessing - The pre-processing template notebook will perform some basic feature encoding for categorical (e.g. categorify) and continuous variables (e.g. standardization). The customer can expand the template with advanced preprocessing ops demonstrated in our examples.
Training - The training and evaluation script for Merlin Models should be totally configurable, taking as input the parquet files and schema, and a number of hyperparameters exposed via command line arguments. The output of this script should be the evaluation metrics, logged as a CSV file and also to Weights&Biases.

Starting Point:

The ranking training scripts we have developed for the MTL research project.

Tasks:

PR: NVIDIA-Merlin/models#988

Tenrec dataset experiments

Documentation

Create documentation for Quick-Start for Ranking (CLI args, best practices and tutorials) #666
- Create the main quick-start for ranking documention
- Create a document explaining all arguments for preprocessing.py and ranking_train_eval.py
- Create a document with the best practices for preprocessing
- Create a document with the best practices for building and training ranking models
- Create a document with the best practices for hypertuning and with the results of the TenRec experiments (in progress)
- Create a tutorial on using Quick-start scripts with W&B Sweeps for HPO

Deployment and inference with Triton

Testing

Create integration tests for ranking models on CI to track accuracy or performance regressions over time #667

Blog post

Blog post introducing the Quick-start for Ranking and benchmark results for TenRec dataset #928

The text was updated successfully, but these errors were encountered:

rnyak · 2023-05-12T19:17:31Z

The tasks below are handled by #966.

#919
#912
#913
#914

gabrielspmoreira added the roadmap label Feb 22, 2023

gabrielspmoreira assigned EvenOldridge and gabrielspmoreira and unassigned EvenOldridge Feb 22, 2023

gabrielspmoreira added this to the Merlin 23.03 milestone Feb 22, 2023

gabrielspmoreira mentioned this issue Feb 22, 2023

[RMP] Quick-start RecSys pipeline and best practices guidance for training and evaluating retrieval and ranking models #732

Open

4 tasks

EvenOldridge modified the milestones: Merlin 23.03, Merlin 23.04 Mar 7, 2023

bschifferer mentioned this issue Mar 28, 2023

[INF]Documentation improvement #867

Open

19 tasks

viswa-nvidia modified the milestones: Merlin 23.04, Merlin 23.05 Apr 25, 2023

EvenOldridge modified the milestones: Merlin 23.05, Merlin 23.06 May 30, 2023

viswa-nvidia modified the milestones: Merlin 23.06, Merlin 23.07 Jun 13, 2023

nv-alaiacano modified the milestones: Merlin 23.06, Merlin 23.07 Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Quick-start for ranking models training pipeline #827

[RMP] Quick-start for ranking models training pipeline #827

gabrielspmoreira commented Feb 22, 2023 •

edited

Loading

rnyak commented May 12, 2023 •

edited

Loading

[RMP] Quick-start for ranking models training pipeline #827

[RMP] Quick-start for ranking models training pipeline #827

Comments

gabrielspmoreira commented Feb 22, 2023 • edited Loading

Problem:

Goal:

Template scripts

Documentation

Constraints:

Starting Point:

Tasks:

Tenrec dataset experiments

Documentation

Deployment and inference with Triton

Testing

Blog post

rnyak commented May 12, 2023 • edited Loading

gabrielspmoreira commented Feb 22, 2023 •

edited

Loading

rnyak commented May 12, 2023 •

edited

Loading