Fix bug in run_*.py scripts: double wrap into DataParallel during eval #3842

and-kul · 2020-04-17T14:54:14Z

This bug is present in several scripts in examples:

examples/run_language_modeling.py
examples/run_multiple_choice.py
examples/run_xnli.py
examples/ner/run_ner.py
examples/mm-imdb/run_mmimdb.py
examples/hans/test_hans.py

The problem is exactly the same as it was in #1801 and in #1504:

During the evaluation, we are trying to wrap the model into DataParallel second time (we did it already during training). As a result we have:

"RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1" (ids of devices may differ)

The fix is straightforward:
Before:

# multi-gpu eval
if args.n_gpu > 1:
    model = torch.nn.DataParallel(model)

After:

# multi-gpu eval
if args.n_gpu > 1 and not isinstance(model, torch.nn.DataParallel):
    model = torch.nn.DataParallel(model)

LysandreJik

LGTM! Thanks for taking time to look at all the scripts affected @and-kul!

julien-c · 2020-04-20T23:37:35Z

Merging this, though it will be rendered obsolete (for a subset of the script initially) by #3800

Fix bug in examples: double wrap into DataParallel during eval

875a960

LysandreJik approved these changes Apr 17, 2020

View reviewed changes

LysandreJik requested a review from julien-c April 17, 2020 15:00

julien-c merged commit b1ff0b2 into huggingface:master Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in run_*.py scripts: double wrap into DataParallel during eval #3842

Fix bug in run_*.py scripts: double wrap into DataParallel during eval #3842

and-kul commented Apr 17, 2020

LysandreJik left a comment

julien-c commented Apr 20, 2020

Fix bug in run_*.py scripts: double wrap into DataParallel during eval #3842

Fix bug in run_*.py scripts: double wrap into DataParallel during eval #3842

Conversation

and-kul commented Apr 17, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

julien-c commented Apr 20, 2020