-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace numpy transpose with torch permute to speed-up #9533
Conversation
vincentwu1 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Hi @Min-Sheng , |
I have updated both the docstring and code.
use
Output: 7.58 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) is faster than
Output: 14.5 ms ± 669 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) So, I use the numpy c_contiguous flag for array continuousness to switch the order of transpose and to_tensor operations. |
Hi @Min-Sheng , Thanks for your kind PR. It seems that CLA is not signed. Could you sign the CLA so that eventually we could merge this PR after review? You can check the contents and follow the instruction in the communication box shown as below |
25aeb24
to
6b8a1b0
Compare
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>
Everything is really for merging. |
Codecov ReportBase: 64.15% // Head: 64.14% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #9533 +/- ##
==========================================
- Coverage 64.15% 64.14% -0.02%
==========================================
Files 361 361
Lines 29583 29586 +3
Branches 5033 5034 +1
==========================================
- Hits 18980 18978 -2
- Misses 9599 9601 +2
- Partials 1004 1007 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
… speed-up (#2604) ## Motivation Original motivation was after [MMDetection PR #9533](open-mmlab/mmdetection#9533) With several experiments I found out that if a ndarray is contiguous, numpy.transpose + torch.contiguous perform better, while if not, then use numpy.ascontiguousarray + numpy.transpose ## Modification Replace numpy.ascontiguousarray with torch.contiguous in [PackSegInputs](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/datasets/transforms/formatting.py) Co-authored-by: MeowZheng <meowzheng@outlook.com>
Hi @Min-Sheng !First of all, we want to express our gratitude for your significant PR in the MMDet project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR. We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:) |
Motivation
numpy.transpose()
is mush more slow thantorch.permute()
according to my benchmarks on a jupyter notebook:Output: 7.69 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Output: 1.65 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Output: 327 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Output: 93.8 ms ± 4.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Modification
Replace the transpose operation
numpy.transpose(2, 0, 1)
withtorch.permute(2, 0, 1)
to in ImageToTensor and DefaultFormatBundle to speed-up the process.