Add Encoder & Predictor #1112

marcromeyn · 2023-05-25T09:27:57Z

Goals ⚽

This PR introduces the Encoder and Predictor classes, to add batch-prediction capabilities in the PyTorch backend.

Implementation Details 🚧

Encoder

The Encoder is meant to be used for things like embedding extraction.

>>> dataset = Dataset(...)
>>> model = mm.TwoTowerModel(dataset.schema)
# `selection=Tags.USER` ensures that only the sub-module(s) of the model
# that processes features tagged as user is used during encoding.
# Additionally, it filters out all other features that aren't tagged as user.
>>> user_encoder = Encoder(model[0], selection=Tags.USER)
# The index is used in the resulting DataFrame after encoding
# Setting unique=True (default value) ensures that any duplicate rows
# in the DataFrame, based on the index, are dropped, leaving only the
# first occurrence.
>>> user_embs = user_encoder(dataset, batch_size=128, index=Tags.USER_ID)
>>> print(user_embs.compute())
user_id    0         1         2    ...   37        38        39        40
0       ...  0.1231     0.4132    0.5123  ...  0.9132    0.8123    0.1123
1       ...  0.1521     0.5123    0.6312  ...  0.7321    0.6123    0.2213
...     ...  ...        ...       ...     ...  ...       ...       ...

Predictor

On the other hand, the Predictor class, will return both the original input data and the corresponding predictions in the output-DF.

>>> dataset = Dataset(...)
>>> model = mm.TwoTowerModel(dataset.schema)
>>> predictor = Predictor(model)
>>> predictions = predictor(dataset, batch_size=128)
>>> print(predictions.compute())
user_id  user_age  item_id  item_category  click  click_prediction
0        24        101      1             1      0.6312
1        35        102      2             0      0.7321
...      ...       ...      ...           ...    ...

github-actions · 2023-05-25T09:39:04Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1112

merlin/models/torch/predict.py

edknv · 2023-06-13T05:02:52Z

Some tests in CPU tests seem to be failing (sample run) related to dask and/or dlpack, e.g.,

E           ValueError: Metadata inference failed in `encode_df`.
E           
E           You have supplied a custom function and Dask is unable to 
E           determine the type of output that that function returns. 
E           
E           To resolve this please provide a meta= keyword.
E           The docstring of the Dask function you ran should have more information.
E           
E           Original error is below:
E           ------------------------
E           BufferError('DLPack only supports signed/unsigned integers, float and complex dtypes.')

marcromeyn added area/pytorch enhancement New feature or request labels May 25, 2023

marcromeyn added 6 commits May 29, 2023 09:35

Some commit

375ae40

Some commit

84fab78

Some commit

5a1924a

Adding better doc-strings + increase test-coverage

d24c019

Merge torch/utils/schema_utils.py into utils/schema_utils.py

494db22

Removing merlin/models/predict.py since it's un-used

96b2545

marcromeyn force-pushed the torch/batch-predict branch from ce776e2 to 96b2545 Compare May 29, 2023 07:48

marcromeyn self-assigned this May 29, 2023

marcromeyn requested review from oliverholworthy and edknv May 29, 2023 11:59

marcromeyn marked this pull request as ready for review May 29, 2023 11:59

marcromeyn added 2 commits May 29, 2023 13:59

Merge branch 'main' into torch/batch-predict

6c5fad9

Merge branch 'main' into torch/batch-predict

385f566

marcromeyn added the status/needs-review label May 29, 2023

marcromeyn mentioned this pull request May 30, 2023

[RMP] Add PyTorch backend in Merlin Models NVIDIA-Merlin/Merlin#893

Open

26 tasks

marcromeyn added 2 commits May 30, 2023 19:18

Merge branch 'main' into torch/batch-predict

fb50333

Merge branch 'main' into torch/batch-predict

042479c

edknv approved these changes May 31, 2023

View reviewed changes

merlin/models/torch/predict.py Show resolved Hide resolved

marcromeyn and others added 9 commits June 1, 2023 10:53

Adding output-schema propagation

101d45d

Making test-classes for functions camel-case

f2f8ca2

Merge branch 'main' into torch/batch-predict

c384f86

Merge branch 'main' into torch/batch-predict

4ef30d2

Merge branch 'main' into torch/batch-predict

3244cc3

Merge branch 'main' into torch/batch-predict

e96ae35

Merge branch 'main' into torch/batch-predict

fc77896

Merge branch 'main' into torch/batch-predict

5c4808a

Merge branch 'main' into torch/batch-predict

22f28ac

marcromeyn and others added 8 commits June 19, 2023 09:13

Merge branch 'main' into torch/batch-predict

764a3bf

Pass meta argument to map_partitions to avoid dtype issues in dask

ee174b4

Merge branch 'main' into torch/batch-predict

0dcb7fc

Merge branch 'main' into torch/batch-predict

f8fa1ca

Merge branch 'main' into torch/batch-predict

26cd4d0

Merge branch 'main' into torch/batch-predict

66e36a5

Remove SchemaTrackingMixin from test_predict

74185b9

Remove select_schema from schema_utils

ed878ed

marcromeyn merged commit aa501f0 into main Jun 22, 2023

marcromeyn deleted the torch/batch-predict branch June 22, 2023 11:45

marcromeyn mentioned this pull request Jul 3, 2023

[RMP] Add support for ranking models in PyTorch NVIDIA-Merlin/Merlin#1044

Open

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Encoder & Predictor #1112

Add Encoder & Predictor #1112

marcromeyn commented May 25, 2023 •

edited

Loading

github-actions bot commented May 25, 2023

edknv commented Jun 13, 2023

Add Encoder & Predictor #1112

Add Encoder & Predictor #1112

Conversation

marcromeyn commented May 25, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Encoder

Predictor

github-actions bot commented May 25, 2023

Documentation preview

edknv commented Jun 13, 2023

marcromeyn commented May 25, 2023 •

edited

Loading