the accuracy issue of left padding and right padding #29419

hijkzzz · 2024-03-04T00:39:02Z

System Info

transformers v4.38.2
docker container: nvcr.io/nvidia/pytorch:23.12-py3

Who can help?

@ArthurZucker
@younesbelkada

Information

The output results of the left and right pading are inconsistent

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# any llama2 model
modelname = "OpenLLMAI/Llama-2-7b-sft-model-ocra-500k"
model = AutoModelForCausalLM.from_pretrained(modelname).cuda()

# left pad
inputs={'input_ids': torch.tensor([[    1,  7251,   727, 29901, 29871],
        [    2,     2,     1, 29871, 29896]]).cuda(), 'attention_mask': torch.tensor([[1, 1, 1, 1, 1],
        [0, 0, 1, 1, 1]]).cuda()}
# right pad
inputs2={'input_ids': torch.tensor([[    1,  7251,   727, 29901, 29871],
        [    1, 29871, 29896,     2,     2]]).cuda(), 'attention_mask': torch.tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 0, 0]]).cuda()}

# baseline
output = model(**inputs)
output2 = model(**inputs2)

output2.logits[1][:3] - output.logits[1][-3:]
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0020, -0.0010, -0.0001,  ...,  0.0006,  0.0013,  0.0007],
        [ 0.0025,  0.0040, -0.0005,  ...,  0.0025,  0.0015,  0.0008]],
       device='cuda:0', grad_fn=<SubBackward0>)

# fixed positions
position_ids = inputs['attention_mask'].long().cumsum(-1) - 1
position_ids2 = inputs2['attention_mask'].long().cumsum(-1) - 1

output = model(**inputs, position_ids=position_ids)
output2 = model(**inputs2, position_ids=position_ids2)

output2.logits[1][:3] - output.logits[1][-3:]
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 9.3555e-04, -8.1062e-06, -8.5831e-05,  ...,  7.9441e-04,
          5.7936e-04,  4.6229e-04]], device='cuda:0', grad_fn=<SubBackward0>)

Expected behavior

no accuracy issue

The text was updated successfully, but these errors were encountered:

hijkzzz · 2024-03-04T00:41:00Z

related issue: OpenRLHF/OpenRLHF#217

ArthurZucker · 2024-03-04T09:47:00Z

A duplicate of #25921 and #25420 I am going to close this, feel free to read this great comment: #25420 (comment)
TLDR: it is expected when you pad inputs

ArthurZucker closed this as completed Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the accuracy issue of left padding and right padding #29419

the accuracy issue of left padding and right padding #29419

hijkzzz commented Mar 4, 2024 •

edited

Loading

hijkzzz commented Mar 4, 2024

ArthurZucker commented Mar 4, 2024 •

edited

Loading

the accuracy issue of left padding and right padding #29419

the accuracy issue of left padding and right padding #29419

Comments

hijkzzz commented Mar 4, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

hijkzzz commented Mar 4, 2024

ArthurZucker commented Mar 4, 2024 • edited Loading

hijkzzz commented Mar 4, 2024 •

edited

Loading

ArthurZucker commented Mar 4, 2024 •

edited

Loading