Skip to content

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Nov 8, 2022

Related to #6818

Performance optimization for the adjust sharpness kernel:

[----------- adjust_sharpness_image_tensor cpu torch.float32 -----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 274                 |    230  
      (3, 400, 400)      |                   4                 |      4  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 300                 |    260  
      (3, 400, 400)      |                   5                 |      5  

Times are in milliseconds (ms).

[----------- adjust_sharpness_image_tensor cuda torch.float32 ----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 440                 |    382  
      (3, 400, 400)      |                 150                 |    100  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 400                 |    400  
      (3, 400, 400)      |                 150                 |    100  

Times are in microseconds (us).

[------------ adjust_sharpness_image_tensor cpu torch.uint8 ------------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 280                 |    240  
      (3, 400, 400)      |                   5                 |      4  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 300                 |    260  
      (3, 400, 400)      |                   7                 |      6  

Times are in milliseconds (ms).

[------------ adjust_sharpness_image_tensor cuda torch.uint8 -----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 520                 |    459  
      (3, 400, 400)      |                 190                 |    110  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 500                 |    460  
      (3, 400, 400)      |                 180                 |    110  

Times are in microseconds (us).

cc @vfdev-5 @bjuncek @pmeier

@datumbox datumbox requested a review from pmeier November 8, 2022 15:09
@datumbox datumbox added module: transforms Perf For performance improvements prototype and removed cla signed labels Nov 8, 2022
@datumbox datumbox requested a review from vfdev-5 November 8, 2022 15:10
Copy link
Contributor

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise LGTM if CI is green.

needs_unsquash = False

output = _blend(image, _FT._blurred_degenerate_image(image), sharpness_factor)
kernel_dtype = image.dtype if fp else torch.float32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ease review, here is the old implementation:

def _blurred_degenerate_image(img: Tensor) -> Tensor:

Comment on lines +140 to +142
# We speed up blending by minimizing flops and doing in-place. The 2 blend options are mathematically equivalent:
# x+(1-r)*(y-x) = x + (1-r)*y - (1-r)*x = x*r + y*(1-r)
view.add_(blurred_degenerate.sub_(view), alpha=(1.0 - sharpness_factor))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can push a change like this to _blend or is this a special case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a special case :( We can do this only because we are allowed to subtract image1 from image2 in place. In all other cases where _blend() is used, we rely on broadcasting do that's not possible.

@datumbox datumbox changed the title Speed up adjust_sharpness_image_tensor [prototype] Speed up adjust_sharpness_image_tensor Nov 8, 2022
@datumbox datumbox merged commit 7a7ab7e into pytorch:main Nov 8, 2022
@datumbox datumbox deleted the perf/adjust_brightness branch November 8, 2022 15:41
facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2022
Summary:
* Speed up `adjust_sharpness_image_tensor`

* Add a comment

Reviewed By: NicolasHug

Differential Revision: D41265190

fbshipit-source-id: 4ebbd1d7a4d763a77f4af84b2da710f7a981a843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants