-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Adreno] Change compute/schedule for ToMixedPrecision pass #12537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few scheduling related questions. I loved to see how the mixed precision pass has helped reduce the schedules.
@@ -437,7 +437,7 @@ def test_conv2d_vgg16_winograd_4d(): | |||
stat_file = temp.relpath("stat.log") | |||
with open(stat_file, "w") as f: | |||
f.write( | |||
'{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd_acc32.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n' | |||
'{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Sorry, I will take a look tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, but I don't know this part of codebase well. I will defer to others.
81075fd
to
02381d0
Compare
The current support of mixed precision in adreno schedules was implemented as standalone schedules having "fp32" suffix. Such kernels can be selected during compilation due to two reasons
The tune flow in its turn was not able to distinguish and point only fp16 or fp16_acc32. Both schedules are tuned and during the compilation the schedule having best time is selected. I.e. by fact without artificial approach we are not able to tune and compile pure fp16 or fp16_acc32. Only manual selection of tune statistics causes currently to execute one of this mode.
In addition to this, the conversion function to fp16 was custom made in the user's script that isnot available to the public tvm user.
To address above issues we are proposing to use
ToMixedPrecision()
pass. It supports mixed precision (fp16 compute with fp32 accumulation) as well.Current PR changes