Skip to content

[BUG] CausalLanguageModeling do not mask last input item #765

@sungho-ham

Description

@sungho-ham

Bug description

The clm masking for last item only do not mask last item in input.
It will cause using the embedding of the label instead of mask.

I think following code needs to be fixed.

mask_labels = item_ids != self.padding_idx

Steps/Code to reproduce bug

import torch
from transformers4rec.torch import masking

item_ids = torch.tensor([[1, 2, 0], ])
mask = masking.CausalLanguageModeling(hidden_size=10, train_on_last_item_seq_only=True)
masking_info = mask.compute_masked_targets(item_ids, training=True)
print(masking_info)
MaskingInfo(schema=tensor([[ True,  True, False]]), targets=tensor([[2, 0, 0]]))

Expected behavior

MaskingInfo(schema=tensor([[ True,  False, False]]), targets=tensor([[2, 0, 0]]))

Environment details

  • Transformers4Rec version: 23.08.00

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions