MMDataParallel Implementation inconsistency between CPU / GPU #792

24hours · 2021-01-16T09:30:55Z

It seems that MMDataParallel support CPU only mode.

However, https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_parallel.py#L52 will prompt error if self.device_ids is falsy.

 return self.module.train_step(*inputs, **kwargs)

In GPU mode, the following code is executed instead https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_parallel.py#L67 with no error message.

return self.module.train_step(*inputs[0], **kwargs[0])

modifying line 52 to (*inputs[0], *kwargs[0]) seems to resolve the error with no other issue. Is there a reason why there is difference between line 52 and line 67 ?

In addition,

why CPU mode unsqueeze additional dimension ? https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/_functions.py#L28
output = output.unsqueeze(0)

The comment mention

# unsquzee the first dimension thus the tensor's shape is the
# same as those scattered with GPU.

But it is not clear how GPU will have addition dimension

The text was updated successfully, but these errors were encountered:

24hours · 2021-01-22T02:12:56Z

I have found another inconsistency between GPU and GPU

from mmcv.parallel import scatter_kwargs, DataContainer
import torch 

inputs = (torch.zeros([20, 3, 128, 128]), )
output, _ = scatter_kwargs(inputs, {}, [-1], 0)
print('CPU', output[0][0].size())
output, _ = scatter_kwargs(inputs, {}, [0], 0)
print('GPU', output[0][0].size())

print('------------')
inputs = (DataContainer([torch.zeros([20, 3, 128, 128])]),)
output, _ = scatter_kwargs(inputs, {}, [-1], 0)
print('CPU', output[0][0].size())
output, _ = scatter_kwargs(inputs, {}, [0], 0)
print('GPU', output[0][0].size())

CPU torch.Size([20, 3, 128, 128])
GPU torch.Size([20, 3, 128, 128])
---------------------
CPU torch.Size([1, 20, 3, 128, 128])
GPU torch.Size([20, 3, 128, 128])

you can see that CPU return wrong size when compare to GPU implementation

ycxioooong · 2021-01-25T13:32:02Z

Thanks for reporting this issue, we'll carefully check the inconsistency.

24hours · 2021-01-25T13:38:46Z

Hi,

I find that updating
https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/scatter_gather.py#L24

to

if obj.cpu_only or target_gpus == [-1]:

fix the issue, however I have not run unittest on the change yet

ycxioooong · 2021-01-26T15:27:14Z

Thanks for your information. We'll update the codes accordingly.

24hours · 2021-02-16T07:11:35Z

I have investigate the issue deeper and find that mmcv do not intend to support CPU training.
I tried to modify some code and allow CPU training, it would require extensive change on MMCV code base and its perhaps not wise to do it without discussion with MMCV team.

I will close this issue.

ycxioooong added Bug:P3 question labels Jan 25, 2021

24hours mentioned this issue Feb 3, 2021

add cpu check on DataContainer scattering #814

Closed

24hours closed this as completed Feb 16, 2021

zhouzaida mentioned this issue Dec 28, 2021

fix the scatter when input is cpu #1621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMDataParallel Implementation inconsistency between CPU / GPU #792

MMDataParallel Implementation inconsistency between CPU / GPU #792

24hours commented Jan 16, 2021 •

edited

Loading

24hours commented Jan 22, 2021

ycxioooong commented Jan 25, 2021

24hours commented Jan 25, 2021

ycxioooong commented Jan 26, 2021

24hours commented Feb 16, 2021

MMDataParallel Implementation inconsistency between CPU / GPU #792

MMDataParallel Implementation inconsistency between CPU / GPU #792

Comments

24hours commented Jan 16, 2021 • edited Loading

24hours commented Jan 22, 2021

ycxioooong commented Jan 25, 2021

24hours commented Jan 25, 2021

ycxioooong commented Jan 26, 2021

24hours commented Feb 16, 2021

24hours commented Jan 16, 2021 •

edited

Loading