Beam search fails when using model parallelism #9200

TobiasNorlund · 2020-12-18T23:08:13Z

Environment info

transformers version: 4.1.1
Platform: Linux-4.4.0-194-generic-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.7.1 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Yes, two GTX 1080, on a single node
Using distributed or parallel set-up in script?: Using model parallelism through model.parallelize()

Who can help

@LysandreJik
@alexorona

Information

Model I am using (Bert, XLNet ...): GPT2

The problem arises when using:

the official example scripts:
my own modified scripts:

The tasks I am working on is:

an official GLUE/SQUaD task:
my own task or dataset:

To reproduce

The recent (and awesome!) model parallelize() doesn't seem to work with beam search decoding at the moment. The behavior can be reproduced on the official huggingface/transformers-pytorch-gpu:4.1.1 docker image by running the following (on a machine with multiple GPUs):

import transformers

tokenizer = transformers.GPT2Tokenizer.from_pretrained("gpt2")
model = transformers.GPT2LMHeadModel.from_pretrained("gpt2")
model.parallelize()

input_ids = tokenizer.encode("This is a test", return_tensors="pt").to("cuda:0")
model.generate(input_ids, num_beams=2)

This raises the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/generation_utils.py", line 612, in generate
    **model_kwargs,
  File "/usr/local/lib/python3.6/dist-packages/transformers/generation_utils.py", line 1088, in beam_search
    model_kwargs["past"] = self._reorder_cache(model_kwargs["past"], beam_idx)
  File "/usr/local/lib/python3.6/dist-packages/transformers/generation_utils.py", line 229, in _reorder_cache
    return tuple(layer_past.index_select(1, beam_idx) for layer_past in past)
  File "/usr/local/lib/python3.6/dist-packages/transformers/generation_utils.py", line 229, in <genexpr>
    return tuple(layer_past.index_select(1, beam_idx) for layer_past in past)
RuntimeError: Input, output and indices must be on the current device

Expected behavior

The expected behavior is to not raise an error, but instead correctly return the beam search decoding.

The text was updated successfully, but these errors were encountered:

TobiasNorlund · 2020-12-18T23:19:47Z

As the trace suggests, the error seem to come from the _reorder_cache method in generation_utils.py. Since the model is parallelized among multiple devices, it fails since the device of beam_idx and layer_past don't match for all layers.

I just tried to modify line 229 in generation_utils.py to:

return tuple(layer_past.index_select(1, beam_idx.to(layer_past.device)) for layer_past in past)

which seems to work.
I'm happy to file a PR with this change if you approve. Please let me know if there is anything I should be aware of, or pay extra attention to.

OyvindTafjord · 2021-05-12T20:32:36Z

FWIW, this fix doesn't currently work for T5, as the fix to _reorder_cache is not reflected in the modeling_t5.py file. Following the above, changing this line to layer_past_state.index_select(0, beam_idx.to(layer_past_state.device)), appears to fix it.

@patrickvonplaten

patrickvonplaten · 2021-05-13T11:10:11Z

@OyvindTafjord - would you mind opening a new PR for it? :-)

TobiasNorlund mentioned this issue Dec 19, 2020

Fix beam search generation for GPT2 and T5 on model parallelism #9219

Merged

5 tasks

TobiasNorlund closed this as completed Dec 21, 2020

OyvindTafjord mentioned this issue May 13, 2021

Fix T5 beam search when using parallelize #11717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beam search fails when using model parallelism #9200

Beam search fails when using model parallelism #9200

TobiasNorlund commented Dec 18, 2020

TobiasNorlund commented Dec 18, 2020

OyvindTafjord commented May 12, 2021 •

edited by LysandreJik

Loading

patrickvonplaten commented May 13, 2021

Beam search fails when using model parallelism #9200

Beam search fails when using model parallelism #9200

Comments

TobiasNorlund commented Dec 18, 2020

Environment info

Who can help

Information

To reproduce

Expected behavior

TobiasNorlund commented Dec 18, 2020

OyvindTafjord commented May 12, 2021 • edited by LysandreJik Loading

patrickvonplaten commented May 13, 2021

OyvindTafjord commented May 12, 2021 •

edited by LysandreJik

Loading