DynUnet get_feature_maps() issue when using distributed training #1564

yiheng-wang-nv · 2021-02-08T10:57:03Z

Describe the bug
If we use DistributedDataParallel to wrap the network, calling the 'get_feature_maps' function will raise the following error:

It is common to return multiple variables in a network's forward function, not only for this case, but also for other situations such as for calculating triplet loss in metric learning based tasks. Therefore, here I think we should re-consider about what to return in the forward function of DynUnet.

Let me think about it and submit a PR then we can discuss it later.
@Nic-Ma @wyli @rijobro

The text was updated successfully, but these errors were encountered:

Nic-Ma · 2021-02-08T15:25:38Z

I recommend to do below change to fix this issue:
Remove the get_feature_maps() and change the logic of forward():

if in training=true and deep_supervision=true, return a list of Tensor.
else, return the output Tensor.

It's slightly different from the original code which always returns a list of Tensor.

@wyli @rijobro @ericspod , what do you guys think?

Thanks.

rijobro · 2021-02-08T15:32:47Z

This will revert the behaviour of #1393. I suppose this is fine as long as we make the documentation clear so that future users aren't confused.

Nic-Ma · 2021-02-09T08:44:58Z

Hi @rijobro ,

I think most of the previous problems are due to the list output during validation or inference.
So I suggest to return a list of data during training, return only the output instead of [out] during validation or inference.

Thanks.

yiheng-wang-nv · 2021-02-10T07:24:11Z

Hi @rijobro ,

I think most of the previous problems are due to the list output during validation or inference.
So I suggest to return a list of data during training, return only the output instead of [out] during validation or inference.

Thanks.

Hi @Nic-Ma @rijobro , actually at the beginning, in val or infer modes, it does only return the output tensor. However, return different types are not permitted for torchscript, thus start from this version, list based results will be returned.
If we all return a list, then our default inferrer is not compatible since it relies on tensor based input.

Nic-Ma · 2021-02-18T04:15:45Z

@ericspod @wyli @rijobro ,

Do you guys have any better idea to solve this issue? Seems it's not easy to get a perfect solution..
Thanks in advance.

rijobro · 2021-02-18T08:59:03Z

So it seems that we need to be consistent with what we return, regardless of train mode, etc., is that correct?

If so, what if we return:

if self.deep_supervision:
     return dict{"data": self.output_block(out), "feature_map": self.heads[1 : self.deep_supr_num + 1]}
else:
    return self.output_block(out)

In this fashion the output is always consistent (since deep_supervision is set at construction time and will not change over the lifetime of the object).

By returning a dictionary, hopefully it will be clearer to users what we are returning (since users were confused by the list previously returned).

wyli · 2021-02-18T12:12:26Z

So it seems that we need to be consistent with what we return, regardless of train mode, etc., is that correct?

If so, what if we return:
if self.deep_supervision:
     return dict{"data": self.output_block(out), "feature_map": self.heads[1 : self.deep_supr_num + 1]}
else:
    return self.output_block(out) 
In this fashion the output is always consistent (since deep_supervision is set at construction time and will not change over the lifetime of the object).

By returning a dictionary, hopefully it will be clearer to users what we are returning (since users were confused by the list previously returned).

thanks this looks good to me as well, we could have the default value self.deep_supervision=False, so that the user wouldn't get this error by default

seg_prob = predictor(window_data, *args, **kwargs).to(device) # batched patch segmentation
AttributeError: 'list' object has no attribute 'to'

Nic-Ma · 2021-02-18T12:23:11Z

@yiheng-wang-nv ,

Please have a try with dict return first, I am afraid it can't fix the TorchScript issue...
Thanks.

rijobro · 2021-02-18T12:26:36Z

If the default inferrer requires a tensor, then presumably you could wrap that function:

def default_dict_inferrer(input, key, *args, **kwargs):
    return default_inferrer(input[key], *args, **kwargs)

yiheng-wang-nv · 2021-02-19T06:36:13Z

Hi @rijobro , thanks for the advice, but return this kind of dict doesn't fix the torchscript issue as @Nic-Ma mentioned. I suggest here we still return a list, and add the corresponding docstrings. In order to help users use this network, I will also add/update tutorials. Let me submit a PR first for you to review.

Nic-Ma assigned yiheng-wang-nv Feb 8, 2021

Nic-Ma added the bug Something isn't working label Feb 8, 2021

yiheng-wang-nv mentioned this issue Feb 19, 2021

Modify dynunet forward function #1596

Merged

6 tasks

wyli closed this as completed in #1596 Feb 19, 2021

yiheng-wang-nv mentioned this issue Nov 5, 2021

Deepedit/DynUNetV1 not supporting MULTI GPU training Project-MONAI/MONAILabel#492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DynUnet get_feature_maps() issue when using distributed training #1564

DynUnet get_feature_maps() issue when using distributed training #1564

yiheng-wang-nv commented Feb 8, 2021 •

edited

Loading

Nic-Ma commented Feb 8, 2021

rijobro commented Feb 8, 2021

Nic-Ma commented Feb 9, 2021

yiheng-wang-nv commented Feb 10, 2021 •

edited

Loading

Nic-Ma commented Feb 18, 2021

rijobro commented Feb 18, 2021

wyli commented Feb 18, 2021

Nic-Ma commented Feb 18, 2021

rijobro commented Feb 18, 2021 •

edited

Loading

yiheng-wang-nv commented Feb 19, 2021 •

edited

Loading

DynUnet get_feature_maps() issue when using distributed training #1564

DynUnet get_feature_maps() issue when using distributed training #1564

Comments

yiheng-wang-nv commented Feb 8, 2021 • edited Loading

Nic-Ma commented Feb 8, 2021

rijobro commented Feb 8, 2021

Nic-Ma commented Feb 9, 2021

yiheng-wang-nv commented Feb 10, 2021 • edited Loading

Nic-Ma commented Feb 18, 2021

rijobro commented Feb 18, 2021

wyli commented Feb 18, 2021

Nic-Ma commented Feb 18, 2021

rijobro commented Feb 18, 2021 • edited Loading

yiheng-wang-nv commented Feb 19, 2021 • edited Loading

yiheng-wang-nv commented Feb 8, 2021 •

edited

Loading

yiheng-wang-nv commented Feb 10, 2021 •

edited

Loading

rijobro commented Feb 18, 2021 •

edited

Loading

yiheng-wang-nv commented Feb 19, 2021 •

edited

Loading