[Benchmark] GPT2LMHeadModel (gpt2-medium) forward pass inference became 9% slower compared to 2.8.0 release #11310

LSinev · 2021-04-19T07:54:34Z

🖥 Benchmarking `GPT2LMHeadModel`

Benchmark

GPT2LMHeadModel model call (and model.generate() too)

Set-up

gpu: gtx 1080
pytorch 1.4.0
transformers 2.8.0, 3.5.1, 4.5.1 releases and latest master branch

Code to reproduce

import timeit
import numpy as np
import torch
from transformers import __version__ as trans_version
from transformers import (
    GPT2LMHeadModel,
)

print("transformers:", trans_version)
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")
print(model.__class__)
model.to("cuda")
model.eval()
rounding = 3

timed_result = timeit.repeat(stmt="""model.generate(input_ids=inp_t,
               max_length=1024,
               min_length=1024,
               do_sample=False,
               early_stopping=False, pad_token_id=50256, eos_token_id=50256)""",
                             setup="""inp = np.random.randint(low=1, high=50255, size=1014);inp_t = torch.LongTensor(inp).unsqueeze(0).to("cuda")""",
                             repeat=30, number=1, globals=globals())
timed_model_result = timeit.repeat(stmt="""with torch.no_grad():
    model(input_ids=inp_t)""",
                             setup="""inp = np.random.randint(low=1, high=50255, size=1024);inp_t = torch.LongTensor(inp).unsqueeze(0).to("cuda")""",
                             repeat=30, number=10, globals=globals())
print('GPT2LMmedium model.generate (using caching) 1014 input, generate to 1024 (mean ± 3std):',
      str(np.round(np.mean(timed_result), rounding)) + '±' + str(np.round(3 * np.std(timed_result), rounding)))
print('GPT2LMmedium model call, 1024 input 10 times (mean ± 3std):',
      str(np.round(np.mean(timed_model_result), rounding)) + '±' + str(np.round(3 * np.std(timed_model_result), rounding)))

Results

While model.generate() code improved and works faster now, model forward pass used in model direct call, became 9% slower

transformers: 2.8.0
<class 'transformers.modeling_gpt2.GPT2LMHeadModel'>
GPT2LMmedium model.generate (using caching) 1014 input, generate to 1024 (mean ± 3std): 0.557±0.037
GPT2LMmedium model call, 1024 input 10 times (mean ± 3std): 1.821±0.017

transformers: 3.5.1
<class 'transformers.modeling_gpt2.GPT2LMHeadModel'>
GPT2LMmedium model.generate (using caching) 1014 input, generate to 1024 (mean ± 3std): 0.37±0.003
GPT2LMmedium model call, 1024 input 10 times (mean ± 3std): 1.849±0.012

transformers: 4.5.1
<class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>
GPT2LMmedium model.generate (using caching) 1014 input, generate to 1024 (mean ± 3std): 0.36±0.003
GPT2LMmedium model call, 1024 input 10 times (mean ± 3std): 1.823±0.013

transformers: 4.6.0.dev0
<class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>
GPT2LMmedium model.generate (using caching) 1014 input, generate to 1024 (mean ± 3std): 0.367±0.004
GPT2LMmedium model call, 1024 input 10 times (mean ± 3std): 1.991±0.013

The text was updated successfully, but these errors were encountered:

LSinev · 2021-04-19T10:29:12Z

@patil-suraj Can you please check if this speed decrease of GPT2LMHeadModel model call is not caused by your PR #11225?

patil-suraj · 2021-04-19T10:55:36Z

Hi @LSinev

Thank you for posting the detailed issue. I will take a look.

github-actions · 2021-05-19T15:08:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patil-suraj self-assigned this Apr 19, 2021

github-actions bot closed this as completed May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] GPT2LMHeadModel (gpt2-medium) forward pass inference became 9% slower compared to 2.8.0 release #11310

[Benchmark] GPT2LMHeadModel (gpt2-medium) forward pass inference became 9% slower compared to 2.8.0 release #11310

LSinev commented Apr 19, 2021 •

edited

Loading

LSinev commented Apr 19, 2021

patil-suraj commented Apr 19, 2021

github-actions bot commented May 19, 2021

[Benchmark] GPT2LMHeadModel (gpt2-medium) forward pass inference became 9% slower compared to 2.8.0 release #11310

[Benchmark] GPT2LMHeadModel (gpt2-medium) forward pass inference became 9% slower compared to 2.8.0 release #11310

Comments

LSinev commented Apr 19, 2021 • edited Loading

🖥 Benchmarking GPT2LMHeadModel

Benchmark

Set-up

Results

LSinev commented Apr 19, 2021

patil-suraj commented Apr 19, 2021

github-actions bot commented May 19, 2021

LSinev commented Apr 19, 2021 •

edited

Loading

🖥 Benchmarking `GPT2LMHeadModel`