Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMP] Turn off accumulation data types for mixed precision pass #8341

Merged
merged 9 commits into from
Jun 29, 2021

Conversation

AndrewZhaoLuo
Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo commented Jun 25, 2021

CUDA codegen cannot seem to handle half types super well. Furthermore, mixing half types and floating point also seems to expose additional issues. Furthermore, some schedules which are supposed to support heterogenous outputs do not.

This seems like a problem in codegen not with the mixed precision pass, so for now I am turning off accumulating into FP32 for the mixed precision pass. With this we can tune BERT and YoloV2 with results here: https://docs.google.com/spreadsheets/d/12lgyfuHaRS-X4uG-1iQOV8oAuPpuVAbspcmkOSPRFHQ/edit#gid=0

I will leave the codegen issues for #8294.
I will leave the issues with schedule not supporting output dtypes here #8340

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. btw I added a tag to the PR title so that it can be easily found in the future.

@comaniac comaniac changed the title Turn off accumulation data types for mixed precision pass [AMP] Turn off accumulation data types for mixed precision pass Jun 25, 2021
@comaniac
Copy link
Contributor

CI failed due to #8344

@masahi masahi merged commit 282c532 into apache:main Jun 29, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…he#8341)

* don't use mixed precision accumulators

* turn off fp32 accumulators for now, adjust passing test cases

* Add TODO on cuda codegen for failures. Make test case pass on cuda for now

test to mixed precision

more tests

add internal func call

broadcast failures

moreee

add comment and change lstm unit test to pass on cuda

* remove debug statements

* to mixed precision

* rebase main

* rtol and atol adjustments

* bump up tolerance again

* jostle CI
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
…he#8341)

* don't use mixed precision accumulators

* turn off fp32 accumulators for now, adjust passing test cases

* Add TODO on cuda codegen for failures. Make test case pass on cuda for now

test to mixed precision

more tests

add internal func call

broadcast failures

moreee

add comment and change lstm unit test to pass on cuda

* remove debug statements

* to mixed precision

* rebase main

* rtol and atol adjustments

* bump up tolerance again

* jostle CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants