Fixed distributed GPU bug in ImageNet example #995

Quentin-Anthony · 2022-04-23T14:42:23Z

The current imagenet example doesn't move images to the GPU in the distributed GPU case, only in the single GPU case:

https://github.com/pytorch/examples/blob/main/imagenet/main.py#L293
https://github.com/pytorch/examples/blob/main/imagenet/main.py#L337

This leads to errors like the following:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

I think this can be fixed by relaxing the if statement.

imagenet/main.py

Fixed bug. Move dataset to GPU for distributed training

78186fd

facebook-github-bot added the cla signed label Apr 23, 2022

soumith reviewed Apr 25, 2022

View reviewed changes

imagenet/main.py Outdated Show resolved Hide resolved

Normalize train and validate GPU movement conditions

57997c4

Quentin-Anthony force-pushed the imagenet-distfix branch from a1b09a9 to 57997c4 Compare April 26, 2022 14:50

subramen reviewed Apr 27, 2022

View reviewed changes

imagenet/main.py Show resolved Hide resolved

Quentin-Anthony closed this Apr 28, 2022

Quentin-Anthony mentioned this pull request Apr 30, 2022

Add distributed usage to ImageNet example's README #1001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed distributed GPU bug in ImageNet example #995

Fixed distributed GPU bug in ImageNet example #995

Uh oh!

Quentin-Anthony commented Apr 23, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fixed distributed GPU bug in ImageNet example #995

Fixed distributed GPU bug in ImageNet example #995

Uh oh!

Conversation

Quentin-Anthony commented Apr 23, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!