Fixes support of sequential continuous features for sequential and non-sequential models #969

gabrielspmoreira · 2023-02-01T02:07:31Z

Fixes #953 , Closes #960

Goals ⚽

Add support to sequential continuous features on sequential models (e.g. Transformers, RNNs) and non-sequential models (e.g. averaging the sequential features and projecting with MLP).

Implementation Details 🚧

Fixed bugs when using concatenating continuous sequential columns with categorical sequential columns in sequential models (e.g. Transformers, RNNs)
Added support to average continuous sequential features by using SequenceAggregator, which nows support dict inputs and applied the combiner only on tensors whose shape is higher than the reduction axis (e.g. 3D tensors). This allows keeping 2D tensors (batch_size, 1) of continuous features as they are and applying the reduction only on sequential continuous features (batch_size, None, 1).
Removed SequenceAggregation enum, as it created difficulties for serialization/deserialization of custom objects. Now SequenceAgreggator just tasks str as input.

Here is an example on how to average both sequential categorical and continuous features for a non-sequential model (e.g. simple MLP):
P.s. The aggregation (average) will only apply for 3D features, keeping the other 2D features unchanged (e.g. user or session level features), so that they can be concatenated.

ml.InputBlockV2(
            schema,
            categorical=ml.Embeddings(
                schema.select_by_tag(Tags.CATEGORICAL), sequence_combiner="mean"
            ),
            continuous=ml.Continuous(post=SequenceAggregator("mean")),
        )

Other related fixes:

Included MLPBlock between InputBlockV2 and Transformer blocks in the units tests to serve as example for the users that usually the concatenated features dim does not match the input dim of Transformers (d_model).
Fixed compute_output_shape() of SequenceTransform derived classes
Fixed ProcessList to return dense tensors when in column schema the value_count.max == value_count.min
Changed maybe_deserialize_keras_objects() to expose the custom_objects argument
Updated compute_output_shape() of EmbeddingTable to deal correctly with 3D input tensors (batch_size, seq_length, 1)
Changed Continuous class to filter by default based on Tags.CONTINUOUS, if the filter is not provided

Testing Details 🔍

Added test (test_mlp_model_with_sequential_features_and_combiner) to demonstrate and test how to use sequential continuous and categorical features in non-sequential model by averaging.
Changed Transformers tests to include not only categorical sequential features but also continuous ones.

github-actions · 2023-02-01T02:14:33Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-969

merlin/models/tf/core/aggregation.py

merlin/models/tf/inputs/continuous.py

sararb

Thank you for the PR @gabrielspmoreira! All the changes sound good to me. I just left some minor comments.

sararb · 2023-02-03T07:44:34Z

merlin/models/tf/transforms/sequence.py

@@ -371,7 +398,6 @@ class SequenceTargetAsInput(SequenceTransform):
        so that the tensors sequences can be processed
    """

-    @tf.function


Do you know why we needed to decorate this call with @tf.function?

Good point @sararb . I end up removing it during debugging and did not returned it. Using Git Blame,
I found that @edknv in this PR annoted with @tf.function some functions to fix some dataloader race conditions with list columns with TF >= 2.10. I gonna return the @tf.function annotation.

merlin/models/tf/transforms/sequence.py

sararb · 2023-02-06T15:20:51Z

tests/unit/tf/blocks/test_mlp.py

+    predict_last = ml.SequencePredictLast(schema=schema.select_by_tag(Tags.SEQUENCE), target=target)
+
+    testing_utils.model_test(
+        model, loader, run_eagerly=run_eagerly, reload_model=False, fit_kwargs={"pre": predict_last}


Can we test with reload_model=True to ensure the model can be re-loaded for new training/eval iterations and to check that the input signatures are correctly saved for serving?

Sure. I changed model_test() to reload_model=True and the test still passes

…n-sequential models.

…mmented code

gabrielspmoreira · 2023-02-07T17:34:23Z

rerun tests

gabrielspmoreira requested review from sararb and rnyak February 1, 2023 02:07

gabrielspmoreira self-assigned this Feb 1, 2023

gabrielspmoreira added bug Something isn't working enhancement New feature or request labels Feb 1, 2023

gabrielspmoreira added this to the Merlin 23.02 milestone Feb 1, 2023

gabrielspmoreira marked this pull request as draft February 1, 2023 02:26

rnyak added the P0 label Feb 1, 2023

gabrielspmoreira force-pushed the tf/continuous_seq_feats_fix branch from 2d13ceb to 2e1c2a7 Compare February 2, 2023 17:07

gabrielspmoreira marked this pull request as ready for review February 2, 2023 17:08

rnyak reviewed Feb 3, 2023

View reviewed changes

merlin/models/tf/core/aggregation.py Outdated Show resolved Hide resolved

merlin/models/tf/inputs/continuous.py Show resolved Hide resolved

sararb approved these changes Feb 6, 2023

View reviewed changes

gabrielspmoreira added 4 commits February 6, 2023 12:41

Fixes to support sequential continuous features for sequential and no…

b9a7cd7

…n-sequential models.

Fixed tests

ee933b7

Fixed remaining tests

8d74073

Improving docstrings

95cc7a5

gabrielspmoreira force-pushed the tf/continuous_seq_feats_fix branch from 83e200f to 95cc7a5 Compare February 6, 2023 15:41

Adding back @tf.function to SequenceTargetAsInput.call(), removing co…

e617bf1

…mmented code

rnyak self-requested a review February 6, 2023 21:26

Minor docstring fix

13e16dd

gabrielspmoreira merged commit 63eb161 into main Feb 8, 2023

gabrielspmoreira mentioned this pull request Feb 21, 2023

[RMP] Tensorflow support for session based recommendations integration in Merlin NVIDIA-Merlin/Merlin#433

Closed

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes support of sequential continuous features for sequential and non-sequential models #969

Fixes support of sequential continuous features for sequential and non-sequential models #969

gabrielspmoreira commented Feb 1, 2023 •

edited

Loading

github-actions bot commented Feb 1, 2023

sararb left a comment

sararb Feb 3, 2023

gabrielspmoreira Feb 6, 2023

sararb Feb 6, 2023

gabrielspmoreira Feb 6, 2023

gabrielspmoreira commented Feb 7, 2023

Fixes support of sequential continuous features for sequential and non-sequential models #969

Fixes support of sequential continuous features for sequential and non-sequential models #969

Conversation

gabrielspmoreira commented Feb 1, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

github-actions bot commented Feb 1, 2023

Documentation preview

sararb left a comment

Choose a reason for hiding this comment

sararb Feb 3, 2023

Choose a reason for hiding this comment

gabrielspmoreira Feb 6, 2023

Choose a reason for hiding this comment

sararb Feb 6, 2023

Choose a reason for hiding this comment

gabrielspmoreira Feb 6, 2023

Choose a reason for hiding this comment

gabrielspmoreira commented Feb 7, 2023

gabrielspmoreira commented Feb 1, 2023 •

edited

Loading