Quick-start example for preprocessing, training and deploying ranking models #988

gabrielspmoreira · 2023-02-14T16:41:54Z

Fixes NVIDIA-Merlin/Merlin#916 , fixes NVIDIA-Merlin/Merlin#917 , fixes NVIDIA-Merlin/Merlin#918, fixes #680, fixes #681, fixes #666

Goals ⚽

This PR introduces a quick-start example for preprocessing, training, evaluating and deploying ranking models.
It is composed by a set of scripts and markdown documents. We use in the example the TenRec dataset, but the scripts are generic and can be used with customer own data, provided that they have the right shape: positive and potentially negative user-item events with tabular features.

Implementation Details 🚧

preprocessing.py - Generic script for preprocessing with CLI arguments for preprocessing a raw dataset (CSV or parquet) with NVTabular. It contains arguments to configure input path and format, categorical and continuous features, configuring the features tagging (user_id, item_id, ...), to filter interactions by using min/max frequency for users or items and dataset split.
Example command line for TenRec dataset:

python preprocessing.py --input_data_format=csv --csv_na_values=\\N --input_data_path /data/QK-video.csv --output_path=$OUT_DATASET_PATH --categorical_features=user_id,item_id,video_category,gender,age --binary_classif_targets=click,follow,like,share --regression_targets=watching_times --to_int32=user_id,item_id --to_int16=watching_times --to_int8=gender,age,video_category,click,follow,like,share --user_id_feature=user_id --item_id_feature=item_id --min_user_freq 5 --persist_intermediate_files --dataset_split_strategy=random --random_split_eval_perc=0.2

ranking_train_eval.py - Generic script for training and evaluation of ranking models. It takes the preprocessed dataset from preprocessing.py and schema as input. You can set many different training and model hparams for train both single-task learning (MLP, DCN, DLRM, Wide&Deep, DeepFM) and multi-task learning specific models (e.g. MMOE, CGC, PLE).

python  ranking_train_eval.py --train_path $OUT_DATASET_PATH/final_dataset/train --eval_path $OUT_DATASET_PATH/final_dataset/eval --output_path ./outputs/ --tasks=click --stl_positive_class_weight 4 --model dlrm --embeddings_dim 64 --l2_reg 1e-5 --embeddings_l2_reg 1e-6 --dropout 0.05 --mlp_layers 64,32  --lr 1e-4 --lr_decay_rate 0.99 --lr_decay_steps 100 --train_batch_size 4096 --eval_batch_size 4096 --epochs 1 --train_steps_per_epoch 10

Testing Details 🔍

The preprocessing and training ranking scripts are going to be added as integration tests.

Tasks

Implementation

[Task] Create a generic dataset preprocessing template for ranking models Merlin#916
Update the research STL/MTL ranking training script to use the latest API (PredictionBlock instead of PredictionTask) Merlin#917
Create a generic STL/MTL ranking training/eval script based on the research scripts Merlin#918
Refine preprocessing.py to provide additional dataset split strategies (e.g. random_by_user, temporal).
Adapt preprocessing.py to use Dask Distributed client for preprocessing larger/full dataset (single or multiple GPU)
Refine scripts to accept both CLI args or YAML args

Experimentation

Documentation

Create documentation for Quick-Start for Ranking (CLI args, best practices and tutorials) Merlin#666 - Create a markdown document providing best practices on setting hyperparameters for ranking models based on the empirical results from our research experimentation (e.g. hparam optimization search space, best hparams found, comparison of the accuracy of STL and MTL models for each task)

Deployment and inference with Triton

Testing

Create integration tests for ranking models on CI to track accuracy or performance regressions over time Merlin#667

github-actions · 2023-02-14T16:49:03Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-988

…Distributed support for larger datasets and also two additional sampling strategies: random_per_user and temporal

…test sets. It now also apply targets transformation only for train and eval, but not for test. Adjusted also arg names for temporal dataset split

…ing to file. Parsing properly list CLI args

…ion), support to TSV as input, options to fill null with a median or a value

gabrielspmoreira · 2023-04-19T22:09:05Z

Closing this PR, as it was moved to Merlin Repo: NVIDIA-Merlin/Merlin#915

gabrielspmoreira assigned gabrielspmoreira and unassigned gabrielspmoreira Feb 14, 2023

gabrielspmoreira marked this pull request as draft February 14, 2023 16:42

gabrielspmoreira added examples area/research labels Feb 14, 2023

gabrielspmoreira mentioned this pull request Feb 14, 2023

[RMP] Quick-start RecSys pipeline and best practices guidance for training and evaluating retrieval and ranking models NVIDIA-Merlin/Merlin#732

Open

4 tasks

rnyak added this to the Merlin 23.03 milestone Feb 15, 2023

gabrielspmoreira mentioned this pull request Feb 22, 2023

[RMP] Quick-start for ranking models training pipeline NVIDIA-Merlin/Merlin#827

Open

19 tasks

gabrielspmoreira modified the milestones: Merlin 23.03, Merlin 23.04 Mar 14, 2023

gabrielspmoreira force-pushed the tf/quick_start_ranking branch from 9df52a6 to 026ffe2 Compare March 17, 2023 14:47

gabrielspmoreira force-pushed the tf/quick_start_ranking branch 2 times, most recently from 32ec264 to c5fb270 Compare April 14, 2023 17:05

gabrielspmoreira added 15 commits April 19, 2023 18:13

Initial version of quick-start for ranking models

df1c45d

Adjusted ranking quick start readme and scripts

d476b93

Updating readme

74611f9

Improved preprocessing script by adding query_filter arg, added Dask …

4fa2e78

…Distributed support for larger datasets and also two additional sampling strategies: random_per_user and temporal

Adjusted scripts and created w&b sweep files for hpo

c986452

Adjusted params and code for Wide&Deep

eb1c7fc

Fixed sample weights args, W&D, MTL metrics and sweep configs

5a09a66

Updated wandb sweeps for quick start

5e18a05

Updated hpo scripts

c1997af

Structuring readmes with the quick-start content

52f4f41

Adding more sweep configs

1ddb381

Removing redundant condition

26ccc87

Added sweeps for new hpo of mtl models

0d6a68b

Complementing documentation

3c9292a

Adding PR-AUC and LogLoss metrics

ab5d7db

gabrielspmoreira added 12 commits April 19, 2023 18:13

Improving preprocessing script to allow providing pre-split eval and …

bcfd82e

…test sets. It now also apply targets transformation only for train and eval, but not for test. Adjusted also arg names for temporal dataset split

Training script: Added support for batch predict for ranking, and sav…

65d6340

…ing to file. Parsing properly list CLI args

Preprocessing: Added support to control features (with no transformat…

268f88f

…ion), support to TSV as input, options to fill null with a median or a value

Adding hypertuning sweeps, benchmark images and docs on CLI args

e624c24

Updating quick start docs

cad8a31

Updating image

dfb903c

Removing image

7c12589

Added the updated image

25b392a

Improved quick-start preprocessing documentation

302617b

Updating links to NVT docs in preproc quickstart doc

5b53e19

Improving ranking script docs

f1d6eac

Fixing logging args and returning with --output_path

89dc479

gabrielspmoreira force-pushed the tf/quick_start_ranking branch from 6948312 to 89dc479 Compare April 19, 2023 21:15

Organizing wandb sweep config files

6db01db

gabrielspmoreira mentioned this pull request Apr 19, 2023

Quick-start for ranking with Merlin Models NVIDIA-Merlin/Merlin#915

Merged

8 tasks

gabrielspmoreira closed this Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick-start example for preprocessing, training and deploying ranking models #988

Quick-start example for preprocessing, training and deploying ranking models #988

gabrielspmoreira commented Feb 14, 2023 •

edited

Loading

github-actions bot commented Feb 14, 2023

gabrielspmoreira commented Apr 19, 2023

Quick-start example for preprocessing, training and deploying ranking models #988

Quick-start example for preprocessing, training and deploying ranking models #988

Conversation

gabrielspmoreira commented Feb 14, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

Tasks

Implementation

Experimentation

Documentation

Deployment and inference with Triton

Testing

github-actions bot commented Feb 14, 2023

Documentation preview

gabrielspmoreira commented Apr 19, 2023

gabrielspmoreira commented Feb 14, 2023 •

edited

Loading