Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extra dim in cpu mode at scatter #1282

Closed
wants to merge 1 commit into from

Conversation

innerlee
Copy link
Contributor

@innerlee innerlee commented Aug 18, 2021

Motivation

Currently the cpu version of scatter will add an extra dim for tensors, which causes problems such as in open-mmlab/mmocr#327

Modification

Remove the extra dim

BC-breaking (Optional)

YES

Downstream repos might be affected. Please check mmdet, mmaction2 and mmpose before merge. Please check their tests and also demos in cpu mode.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Signed-off-by: lizz <lizz@sensetime.com>
@innerlee innerlee changed the title Fix extra dim in cpu mode at scatter Remove extra dim in cpu mode at scatter Aug 18, 2021
@codecov
Copy link

codecov bot commented Aug 18, 2021

Codecov Report

Merging #1282 (244e7f6) into master (9679ce8) will decrease coverage by 0.00%.
The diff coverage is 45.45%.

❗ Current head 244e7f6 differs from pull request most recent head 91fca88. Consider uploading reports for the commit 91fca88 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1282      +/-   ##
==========================================
- Coverage   68.42%   68.42%   -0.01%     
==========================================
  Files         160      160              
  Lines       10610    10616       +6     
  Branches     1938     1940       +2     
==========================================
+ Hits         7260     7264       +4     
  Misses       2975     2975              
- Partials      375      377       +2     
Flag Coverage Δ
unittests 68.42% <45.45%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcv/parallel/_functions.py 14.00% <ø> (+0.27%) ⬆️
mmcv/parallel/data_parallel.py 36.58% <45.45%> (+4.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9679ce8...91fca88. Read the comment docs.

@hhaAndroid
Copy link
Contributor

@innerlee I tested mmdet and it will report an error for the following reasons:

class Scatter:

    @staticmethod
    def forward(target_gpus, input):
        input_device = get_input_device(input)
        streams = None
        if input_device == -1 and target_gpus != [-1]:
            # Perform CPU to GPU copies in a background stream
            streams = [_get_stream(device) for device in target_gpus]
        # input shape=(1,3,100,100)
        # outputs shape=(1,3,100,100)
        outputs = scatter(input, target_gpus, streams)
        # Synchronize with the copy stream
        if streams is not None:
            synchronize_stream(outputs, target_gpus, streams)

        return tuple(outputs) # (3,100,100),

Because the batch dimension is missing, subsequent inferences will report an error.

@innerlee
Copy link
Contributor Author

@hhaAndroid do you know where the batch dim was added in gpu code?

@hhaAndroid
Copy link
Contributor

@hhaAndroid do you know where the batch dim was added in gpu code?

The collate function output of dataloader includes the batch dimension.

@innerlee
Copy link
Contributor Author

Okay so we should make the behavior of collate consistent across cpu/gpu, right?

@hhaAndroid
Copy link
Contributor

Okay so we should make the behavior of collate consistent across cpu/gpu, right?

No, your understanding is wrong. The collate does not consider CPU or GPU, the output always includes batch.

@innerlee
Copy link
Contributor Author

The collate function output of dataloader includes the batch dimension.

Where is the extra dim added for gpu tensors?

@hhaAndroid
Copy link
Contributor

The collate function output of dataloader includes the batch dimension.

Where is the extra dim added for gpu tensors?

Sorry, I am late in replying, I will provide you with a detailed description document.

@innerlee
Copy link
Contributor Author

Cool, if you got extra time, please take a look at this bug open-mmlab/mmocr#327. Checkout that branch and run this line https://github.com/open-mmlab/mmocr/pull/327/files#diff-2f44c053cce5ce1b42ec4cd7d50aca219b338cefe5aa8831813d6f4020fca740R14

@zhouzaida
Copy link
Collaborator

hi @innerlee @hhaAndroid , is there any progress?

@innerlee
Copy link
Contributor Author

Not really. Currently I do not have bandwidth to handle this issue. You may check if anyone in mmdet team want to take a look

@zhouzaida
Copy link
Collaborator

Not really. Currently I do not have bandwidth to handle this issue. You may check if anyone in mmdet team want to take a look

got it

@HAOCHENYE
Copy link
Collaborator

Solved by #1621

@HAOCHENYE HAOCHENYE closed this Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants