New design of the transformer API #1022

sararb · 2023-03-16T21:21:58Z

Fixes #918
Fixes #1024
Fixes #1025
Fixes #1026

This work was necessary for the GTC tutorial 2023 and is fixing issues related to Causal LM and at the same time simplifying the high-level transformer API.
The new API for defining a transformer-based model is defined as follows:

    transformer_input_dim = 48
    transformer_block = BertBlock(d_model=transformer_input_dim, n_head=8, n_layer=2)
    model = mm.Model(
        mm.InputBlockV2(...),
        transformer_block,
        mm.CategoricalOutput(...),
    )
    seq_mask_random = mm.SequenceMaskRandom(
        schema=seq_schema, target=target, masking_prob=0.3, transformer=transformer_block
    )
    model.compile(...)
    model.fit(..., pre=seq_mask_random)

I created this doc for more details about the Merlin Transformer API and the proposed changes: https://docs.google.com/document/d/1auNJCOPFjyVxlhx4eUUthpMLm-rva0jC-lo0ChSZWzI/edit#

Goals ⚽

This PR aims to address the following limitations in the current Transformer API:

1. Inference for CausalLM is not supported: the model returns scores for all positions in the input sequence but we are only interested on the last position at inference time.
2. CLM only supports the transform SequencePredictNext: We cannot train or evaluate a CLM model on the last item of the sequence only (i.e using SequencePredictLast)
3. Specialized and complex API: the support of masking and inference is done via specialized Merlin Blocks (ReplaceMaskedEmbeddings, SequenceMaskLastInference). So the user should be familiar with all these custom blocks to correctly define and train a transformer-based model with CLM or MLM approach.
4. Output of the model is a padded dense tensor of scores: As the HuggingFace transformer layer is requiring a dense input. We are converting the ragged inputs to dense (by 0-padding to the maximum length seen in the given batch) right before calling the HF layer. The ops that follow this block are then applied to the padded dense tensor. This means that we compute the logit scores for all positions (even the padded ones) and which can be costly (such in the weight tying multiplication between the hidden representation all items embeddings).

Implementation Details 🚧

Add Inference support to a transformer-based model trained with CLM: i.e. select the prediction score of the last non-padded position.
Support evaluating a transformer-based model trained with CLM (trained with SequencePredictNext) on the last item in the sequence (i.e using SequencePredictLast).
Simplify the transformer API by abstracting the definition of the masking_pre and masking_post blocks using the pre transform set in the fit() method.
Optimize the predictions generation of transformer-based model: Convert the dense tensor returned by the transformer layer to a tf.RaggedTensor so that all the logic happening after the transformer block (MLP projections, softmax layer...) is applied only on actual positions.
Add masking support for SequencePredictNext and SequencePredictLast
Add checks to ensure the SequenceTransform used in evaluate() is aligned with the masking_pre used to train the transformer model.
Implement the following specialized blocks (needed for CLM and MLM support):
- SequenceCausalLastInference to generate a mask indicating the last non-padded position of each input sequence at inference time.
- ExtractMaskFromTargets to move the logic defined in ReplaceMaskedEmbeddings and which consists of inferring the mask information based on targets.
- TransformerOutputToRagged to convert the output of the transformer layer to Ragged based on the mask information.
Update the example notebooks

Testing Details 🔍

Update test_transformer_with_masked_language_modeling and test_transformer_with_causal_language_modeling to account for the changes.
Update the

…02 branch

review-notebook-app · 2023-03-16T21:22:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions · 2023-03-16T21:28:50Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1022

…hem inside configure_for_train() function.

gabrielspmoreira

Looks great to me @sararb .

sararb added 9 commits March 16, 2023 13:51

implement new design of the Transformer API on top of the release-23.…

b6f1809

…02 branch

add support of ragged tensor to weight tying

472f2b7

update example notebook with the new API

6ad7a3d

include PR comments

45d0401

fix masking of sequence-predict-next transform

f122bb3

adjust sample_weights to targets shape

afcb735

add masking support to SequencePredictRandom transform

8dd2384

rebase with main branch to include data loader changes

b9af543

fix linting

ac5efe1

sararb added enhancement New feature or request area/tensorflow P0 breaking Breaking change area/session-based labels Mar 16, 2023

sararb added this to the Merlin 23.03 milestone Mar 16, 2023

sararb self-assigned this Mar 16, 2023

sararb mentioned this pull request Mar 16, 2023

New design of the transformer API to support causal and masked pre-training approach #1008

Closed

sararb added 3 commits March 17, 2023 18:21

Fix the adjust-predictions logic to support targets as 2-D scalars

61fd1da

Fix transformer example notebook

df5a004

Merge branch 'main' into transformer-api

af2d519

rnyak requested review from gabrielspmoreira and rnyak March 20, 2023 15:34

update import of transformer blocks in transforms/sequence and move t…

349a5f9

…hem inside configure_for_train() function.

sararb force-pushed the transformer-api branch from 00a3c70 to 349a5f9 Compare March 20, 2023 20:07

gabrielspmoreira approved these changes Mar 21, 2023

View reviewed changes

Merge branch 'main' into transformer-api

a2fdb68

karlhigley merged commit 62e0591 into main Mar 21, 2023

oliverholworthy mentioned this pull request Jun 7, 2023

[BUG] Getting error when loading back a saved session-based model #1132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New design of the transformer API #1022

New design of the transformer API #1022

sararb commented Mar 16, 2023 •

edited

Loading

review-notebook-app bot commented Mar 16, 2023

github-actions bot commented Mar 16, 2023

gabrielspmoreira left a comment

New design of the transformer API #1022

New design of the transformer API #1022

Conversation

sararb commented Mar 16, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

review-notebook-app bot commented Mar 16, 2023

github-actions bot commented Mar 16, 2023

Documentation preview

gabrielspmoreira left a comment

Choose a reason for hiding this comment

sararb commented Mar 16, 2023 •

edited

Loading