[prototype] Gaussian Blur clean up #6888

datumbox · 2022-11-02T12:58:00Z

Related to #6818

This PR:

Cleans up the assertions on the gaussian_blur kernel
Simplifies the reshaping logic
Adds in-place ops where possible

No regression on the speed, just a small 5% improvement on CUDA:

[------------- gaussian_blur cpu torch.float32 -------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |    4 (+-  0) ms  |    4 (+-  0) ms
      (16, 3, 400, 500)  |  306 (+-  3) ms  |  306 (+-  2) ms
6 threads: --------------------------------------------------
      (3, 400, 500)      |    6 (+-  0) ms  |    6 (+-  0) ms
      (16, 3, 400, 500)  |  334 (+-  1) ms  |  334 (+-  4) ms

Times are in milliseconds (ms).

[------------- gaussian_blur cuda torch.float32 ------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |  119 (+-  1) us  |  112 (+-  1) us
      (16, 3, 400, 500)  |  266 (+-  0) us  |  266 (+-  0) us
6 threads: --------------------------------------------------
      (3, 400, 500)      |  119 (+-  2) us  |  113 (+-  2) us
      (16, 3, 400, 500)  |  266 (+-  1) us  |  266 (+-  0) us

Times are in microseconds (us).

[-------------- gaussian_blur cpu torch.uint8 --------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |    5 (+-  0) ms  |    5 (+-  0) ms
      (16, 3, 400, 500)  |  355 (+-  1) ms  |  331 (+-  1) ms
6 threads: --------------------------------------------------
      (3, 400, 500)      |    7 (+-  0) ms  |    7 (+-  0) ms
      (16, 3, 400, 500)  |  383 (+-  2) ms  |  359 (+-  5) ms

Times are in milliseconds (ms).

[-------------- gaussian_blur cuda torch.uint8 -------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |  150 (+-  1) us  |  142 (+-  1) us
      (16, 3, 400, 500)  |  423 (+-  0) us  |  423 (+-  0) us
6 threads: --------------------------------------------------
      (3, 400, 500)      |  150 (+-  3) us  |  142 (+-  3) us
      (16, 3, 400, 500)  |  423 (+-  0) us  |  423 (+-  0) us

Times are in microseconds (us).

cc @vfdev-5 @bjuncek @pmeier

vfdev-5

OK to me. I'm not sure why we can't use _cast_squeeze_in, _cast_squeeze_out anymore, but ok.

datumbox · 2022-11-02T13:19:48Z

@vfdev-5 We could use it. But this means we will have multiple pieces of code handling reshaping (or needs_unsquash and need_squeeze in the previous code). In addition the casting mechanism in _cast_squeeze_in and _cast_squeeze_out makes assumptions over the order of preference of the provided dtypes and requires unnecessary complex checks for rounding. Casting can be simplified; the only thing we need to check is if the input was float and then just round and cast.

datumbox · 2022-11-02T13:52:45Z

It seems that this closed the gap between V1 and V2 for the GaussianBlur transform. My new benchmarks between V1+pure tensor and V2+feature report:

[------------- gaussian_blur cpu torch.float32 -------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |   13 (+-  1) ms  |    4 (+-  0) ms
      (16, 3, 400, 500)  |  306 (+-  1) ms  |  306 (+-  0) ms

Times are in milliseconds (ms).

[------------- gaussian_blur cuda torch.float32 ------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |  247 (+- 37) us  |  156 (+-  1) us
      (16, 3, 400, 500)  |  356 (+-  2) us  |  269 (+-  0) us

Times are in microseconds (us).

[-------------- gaussian_blur cpu torch.uint8 --------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |    5 (+-  0) ms  |    5 (+-  0) ms
      (16, 3, 400, 500)  |  351 (+-  1) ms  |  330 (+-  1) ms

Times are in milliseconds (ms).

[-------------- gaussian_blur cuda torch.uint8 -------------]
                         |        old       |        new     
1 threads: --------------------------------------------------
      (3, 400, 500)      |  261 (+-  5) us  |  189 (+-  1) us
      (16, 3, 400, 500)  |  515 (+-  4) us  |  426 (+-  0) us

Times are in microseconds (us).

@vfdev-5 Might be worth rerunning the benchmarks later on your side to confirm.

Summary: * Refactor gaussian_blur * Add conditional reshape * Further refactoring * Remove unused import. Reviewed By: datumbox Differential Revision: D41020542 fbshipit-source-id: 72694024272d91818c4154f7b5f7097e6d21154f

datumbox added 3 commits November 1, 2022 18:43

Refactor gaussian_blur

c271114

Add conditional reshape

48f40dc

Further refactoring

826167d

datumbox added module: transforms code quality prototype labels Nov 2, 2022

datumbox requested review from vfdev-5 and pmeier November 2, 2022 12:58

facebook-github-bot added the cla signed label Nov 2, 2022

Merge branch 'main' into prototype/gaussian_blur

9e95136

vfdev-5 approved these changes Nov 2, 2022

View reviewed changes

Remove unused import.

cdcd9c7

datumbox merged commit 1921613 into pytorch:main Nov 2, 2022

datumbox deleted the prototype/gaussian_blur branch November 2, 2022 13:32

datumbox added the Perf For performance improvements label Nov 2, 2022

datumbox mentioned this pull request Nov 2, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prototype] Gaussian Blur clean up #6888

[prototype] Gaussian Blur clean up #6888

datumbox commented Nov 2, 2022 •

edited

Loading

vfdev-5 left a comment

datumbox commented Nov 2, 2022

datumbox commented Nov 2, 2022

[prototype] Gaussian Blur clean up #6888

[prototype] Gaussian Blur clean up #6888

Conversation

datumbox commented Nov 2, 2022 • edited Loading

vfdev-5 left a comment

Choose a reason for hiding this comment

datumbox commented Nov 2, 2022

datumbox commented Nov 2, 2022

datumbox commented Nov 2, 2022 •

edited

Loading