Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLM-R model output changes with batch size #1605

Closed
ricardorei opened this issue Jan 9, 2020 · 1 comment
Closed

XLM-R model output changes with batch size #1605

ricardorei opened this issue Jan 9, 2020 · 1 comment
Assignees
Labels

Comments

@ricardorei
Copy link

🐛 Bug

When using XLM-R the representations change depending on the batch size.

Code sample

from fairseq.models.roberta import XLMRModel
from torchnlp.encoders.text import stack_and_pad_tensors
import torch

torch.set_printoptions(precision=10)
def batch_encoder(samples, tokenizer):
    batch = []
    for sequence in samples:
        batch.append(tokenizer.encode(sequence))
    return stack_and_pad_tensors(batch, tokenizer.task.source_dictionary.__dict__["indices"]["<pad>"])
    
xlmr = XLMRModel.from_pretrained(
            "pretrained/xlmr.base", checkpoint_file="model.pt"
        )
xlmr.eval()

samples = [
    'the part of the regular expression within the forward slashes defines the pattern.', 
    'discards the current state and temporarily replaces it with the previous state.',
    'to convert a smooth point to a corner point without direction lines, click the smooth point.'
]

with torch.no_grad():
    big_batch_tokens, bb_lengths = batch_encoder(samples, xlmr)
    small_batch_tokens, sb_lengths = batch_encoder(samples[:2], xlmr)
    first_sample_tokens = xlmr.encode(samples[0])

    first_sample_last_layer = xlmr.extract_features(first_sample_tokens)
    print (first_sample_last_layer[:, 0, :][0][:5])

    small_batch_last_layer = xlmr.extract_features(tokens=small_batch_tokens)
    print (small_batch_last_layer[:, 0, :][0][:5])

    big_batch_last_layer = xlmr.extract_features(tokens=big_batch_tokens)
    print (big_batch_last_layer[:, 0, :][0][:5])

Expected behavior

tensor([ 0.0852593556,  0.1065494418,  0.0615975149, -0.0047241775, 0.0284897964])
tensor([ 0.0852593333,  0.1065494195,  0.0615975149, -0.0047241990, 0.0284897070])
tensor([ 0.0852593556,  0.1065494046,  0.0615975186, -0.0047241938, 0.0284897685])

Additional context

If I decide to average pool overall embeddings or if I max pool these differences are even bigger.

Am I doing something wrong? Is this behaviour expected?

@ngoyal2707
Copy link
Contributor

This seems to be floating point math issue.
I get similar range of difference when trying on CPU, but on GPU it seems to be exactly the same till 10th digit.
Some discussion on pytorch thread: pytorch/pytorch#4914 (although that one has floating point issues on CUDA rather than CPU).

on CUDA:

tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')
tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')
tensor([-0.0130963037,  0.0021208122,  0.0833869055,  0.0168007165,
        -0.0006483230], device='cuda:0')

on CPU

tensor([-0.0130964424,  0.0021210182,  0.0833871067,  0.0168008748,
        -0.0006483837])
tensor([-0.0130964424,  0.0021210182,  0.0833871067,  0.0168008748,
        -0.0006483837])
tensor([-0.0130964629,  0.0021210436,  0.0833871216,  0.0168008823,
        -0.0006483909])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants