XLM-R model output changes with batch size #1605

ricardorei · 2020-01-09T16:34:51Z

🐛 Bug

When using XLM-R the representations change depending on the batch size.

Code sample

from fairseq.models.roberta import XLMRModel
from torchnlp.encoders.text import stack_and_pad_tensors
import torch

torch.set_printoptions(precision=10)
def batch_encoder(samples, tokenizer):
    batch = []
    for sequence in samples:
        batch.append(tokenizer.encode(sequence))
    return stack_and_pad_tensors(batch, tokenizer.task.source_dictionary.__dict__["indices"]["<pad>"])
    
xlmr = XLMRModel.from_pretrained(
            "pretrained/xlmr.base", checkpoint_file="model.pt"
        )
xlmr.eval()

samples = [
    'the part of the regular expression within the forward slashes defines the pattern.', 
    'discards the current state and temporarily replaces it with the previous state.',
    'to convert a smooth point to a corner point without direction lines, click the smooth point.'
]

with torch.no_grad():
    big_batch_tokens, bb_lengths = batch_encoder(samples, xlmr)
    small_batch_tokens, sb_lengths = batch_encoder(samples[:2], xlmr)
    first_sample_tokens = xlmr.encode(samples[0])

    first_sample_last_layer = xlmr.extract_features(first_sample_tokens)
    print (first_sample_last_layer[:, 0, :][0][:5])

    small_batch_last_layer = xlmr.extract_features(tokens=small_batch_tokens)
    print (small_batch_last_layer[:, 0, :][0][:5])

    big_batch_last_layer = xlmr.extract_features(tokens=big_batch_tokens)
    print (big_batch_last_layer[:, 0, :][0][:5])

Expected behavior

tensor([ 0.0852593556,  0.1065494418,  0.0615975149, -0.0047241775, 0.0284897964])
tensor([ 0.0852593333,  0.1065494195,  0.0615975149, -0.0047241990, 0.0284897070])
tensor([ 0.0852593556,  0.1065494046,  0.0615975186, -0.0047241938, 0.0284897685])

Additional context

If I decide to average pool overall embeddings or if I max pool these differences are even bigger.

Am I doing something wrong? Is this behaviour expected?

The text was updated successfully, but these errors were encountered:

ngoyal2707 · 2020-01-21T17:20:43Z

This seems to be floating point math issue.
I get similar range of difference when trying on CPU, but on GPU it seems to be exactly the same till 10th digit.
Some discussion on pytorch thread: pytorch/pytorch#4914 (although that one has floating point issues on CUDA rather than CPU).

on CUDA:

tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')
tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')
tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')

on CPU

tensor([-0.0130964424,  0.0021210182,  0.0833871067,  0.0168008748,
        -0.0006483837])
tensor([-0.0130964424,  0.0021210182,  0.0833871067,  0.0168008748,
        -0.0006483837])
tensor([-0.0130964629,  0.0021210436,  0.0833871216,  0.0168008823,
        -0.0006483909])

ricardorei added bug needs triage labels Jan 9, 2020

ngoyal2707 self-assigned this Jan 9, 2020

myleott removed the needs triage label Jan 21, 2020

ngoyal2707 closed this as completed Jan 21, 2020

ricardorei mentioned this issue Mar 10, 2020

Model load_from_checkpoint Lightning-AI/pytorch-lightning#525

Closed

AdityaSoni19031997 mentioned this issue Apr 14, 2020

XLM-R model OOM (PyTorch XLA limitations vs TF) pytorch/xla#1870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLM-R model output changes with batch size #1605

XLM-R model output changes with batch size #1605

ricardorei commented Jan 9, 2020

ngoyal2707 commented Jan 21, 2020

XLM-R model output changes with batch size #1605

XLM-R model output changes with batch size #1605

Comments

ricardorei commented Jan 9, 2020

🐛 Bug

Code sample

Expected behavior

Additional context

ngoyal2707 commented Jan 21, 2020