Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Operator] optimize normalize op with vectorized load, dynamic shape and more #316

Merged
merged 3 commits into from
Jul 16, 2023

Conversation

xinli-git
Copy link
Collaborator

@xinli-git xinli-git commented Jul 16, 2023

this change introduces several enhancements to the current norm operator

  • vectorized load for fp16 types
  • allow epilogue
  • dynamic shape on the normalized dimension
  • add a tuning to use 2 warp shuffle routines or just a single one
  • cleaner code and implementation

as a result, norm_fp16.py can be safely deleted

for shapes in stable diffusion: [2, 32, 60, 16, 16], norm dims [60, 16, 16], fp32:

  • torch : 0.023 ms
  • main: 0.036 ms
  • this change: 0.027 ms

for shapes in bert-base: [1, 128, 768], norm dims [768], fp16:

  • torch: 0.006 ms
  • main: 0.008 ms
  • this change: 0.007 ms

We are still not faster than torch, but very close

@xinli-git xinli-git merged commit 15426c8 into hidet-org:main Jul 16, 2023
@xinli-git
Copy link
Collaborator Author

the tests are passing
this change is isolated to norm OP so will merge without a review.

@@ -104,39 +108,69 @@ def allow_prologue(self) -> bool:
return False

def allow_epilogue(self) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

问一个比较小白的问题,prologue 和这里的 epilogue,在中文里面一般怎么翻译,二者是什么作用呢,谢谢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the section 4.2 and section 5.2 of our paper[1] to learn more about about the prologue and epilogue fusion (in the paper, it's called post-scheduling-fusion). It seems that there is no obvious translation in Chinese, maybe "前驱算子" and "后继算子".

[1] https://dl.acm.org/doi/pdf/10.1145/3575693.3575702

@xinli-git xinli-git deleted the norm_optimize branch August 21, 2023 02:48
vadiklyutiy pushed a commit that referenced this pull request Jul 22, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Jul 23, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
Previously I added hidet.ones_like without handling dtype and device.
Now made it properly work with all these arguments

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants