Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impl SmoothL1Loss #3348

Open
wants to merge 80 commits into
base: develop
Choose a base branch
from
Open

Impl SmoothL1Loss #3348

wants to merge 80 commits into from

Conversation

long10024070
Copy link
Collaborator

  • Added SmoothL1Loss forward and backward.
  • Added driver test and gtest for both direction of SmoothL1Loss.
  • New API is guarded by MIOPEN_BETA_API macro.
  • Compared to ROCm pytorch:
float16
op_name dtype size contiguous reduction model beta direction ROCm pytorch MIOpen HIP Improvement
SmoothL1Loss float16 [7 4] true sum ssd/ssdlite 1 fwd 14520 8173 1.78
SmoothL1Loss float16 [27 4] true sum ssd/ssdlite 1 fwd 14145 8263 1.71
SmoothL1Loss float16 [41 4] true sum ssd/ssdlite 1 fwd 14088 8497 1.66
SmoothL1Loss float16 [62 4] true sum ssd/ssdlite 1 fwd 14189 8565 1.66
SmoothL1Loss float16 [3 4] true sum ssd/ssdlite 1 fwd 13449 8140 1.65
SmoothL1Loss float16 [20 4] false sum ssd/ssdlite 1 fwd 15286 8276 1.85
SmoothL1Loss float16 [3 4] false sum ssd/ssdlite 1 fwd 13516 8134 1.66
SmoothL1Loss float16 [34 4] false sum ssd/ssdlite 1 fwd 13654 8541 1.60
SmoothL1Loss float16 [18 4] false sum ssd/ssdlite 1 fwd 13049 8199 1.59
SmoothL1Loss float16 [22 4] false sum ssd/ssdlite 1 fwd 13062 8218 1.59
SmoothL1Loss float16 [155 4] false sum ssdlite 1 bwd 12649 7866 1.61
SmoothL1Loss float16 [163 4] false sum ssd/ssdlite 1 bwd 11184 8011 1.40
SmoothL1Loss float16 [129 4] false sum ssd/ssdlite 1 bwd 10881 7839 1.39
SmoothL1Loss float16 [98 4] false sum ssdlite 1 bwd 10078 7762 1.30
SmoothL1Loss float16 [108 4] false sum ssd/ssdlite 1 bwd 10073 7789 1.29
float32
op_name dtype size contiguous reduction model beta direction ROCm pytorch MIOpen HIP Improvement
SmoothL1Loss float32 [20 4] true sum ssd/ssdlite 1 fwd 17193 8389 2.05
SmoothL1Loss float32 [7 4] true sum ssd/ssdlite 1 fwd 15565 8129 1.91
SmoothL1Loss float32 [3 4] true sum ssd/ssdlite 1 fwd 13710 8102 1.69
SmoothL1Loss float32 [47 4] true sum ssd/ssdlite 1 fwd 14861 8785 1.69
SmoothL1Loss float32 [34 4] true sum ssd/ssdlite 1 fwd 14504 8668 1.67
SmoothL1Loss float32 [3 4] false sum ssd/ssdlite 1 fwd 13745 8154 1.69
SmoothL1Loss float32 [34 4] false sum ssd/ssdlite 1 fwd 13998 8670 1.61
SmoothL1Loss float32 [22 4] false sum ssd/ssdlite 1 fwd 13561 8424 1.61
SmoothL1Loss float32 [30 4] false sum ssd/ssdlite 1 fwd 13558 8423 1.61
SmoothL1Loss float32 [20 4] false sum ssd/ssdlite 1 fwd 13561 8435 1.61
SmoothL1Loss float32 [104 4] false sum ssd/ssdlite 1 bwd 12889 8029 1.61
SmoothL1Loss float32 [129 4] false sum ssd/ssdlite 1 bwd 12197 8120 1.50
SmoothL1Loss float32 [131 4] false sum ssd/ssdlite 1 bwd 11667 8111 1.44
SmoothL1Loss float32 [137 4] false sum ssdlite 1 bwd 11638 8132 1.43
SmoothL1Loss float32 [155 4] false sum ssdlite 1 bwd 12251 8569 1.43

For the bfloat16 datatype, ROCm pytorch SmoothL1Loss operator doesn't support this datatype.

  • Average over all cases:
type average
float16 1.48
float32 1.63
bfloat16 -

long10024070 and others added 30 commits April 10, 2024 03:31
@long10024070 long10024070 requested a review from a team as a code owner November 5, 2024 05:23
Copy link
Contributor

@iq136boy iq136boy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following clang-format issue need to be fixed:
[2024-11-19T07:52:05.153Z] 453,454c453
[2024-11-19T07:52:05.153Z] < if(out_dev->FromGPU(GetStream(), out.data()) != 0)
[2024-11-19T07:52:05.153Z] < {
[2024-11-19T07:52:05.153Z] ---
[2024-11-19T07:52:05.153Z] > if(out_dev->FromGPU(GetStream(), out.data()) != 0) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants