-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[Hexagon] Add test to show scheduling of resnet50 with async dma pipe… #13352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
return tvm_build(mod, target=target) | ||
else: | ||
mod = RemoveWeightLayoutRewriteBlock(skip_ndarray_rewrite=True)(mod) | ||
return tvm_build(mod, target=target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use meta_schdule.builder.default_build here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Changed this to use the old strategy if pass context is not present.
def dot_product_32x4_u8i8i32_vtcm_desc( | ||
A: T.Buffer((4,), "uint8", offset_factor=1, scope="global.vtcm"), | ||
B: T.Buffer((32, 4), "int8", offset_factor=1, scope="global.vtcm"), | ||
C: T.Buffer((32,), "int32", offset_factor=1, scope="global.vtcm"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to parametrize scope
to remove duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! ✅
tests/python/contrib/test_hexagon/metaschedule_e2e/test_resnet50_int8.py
Outdated
Show resolved
Hide resolved
|
||
|
||
@tvm.testing.requires_hexagon | ||
def test_async_dma_resnet50(hexagon_launcher): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to remove dups with test_packed_8x8x32_resnet50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to unify a good amount of the code ✅
debug_ex = session.get_graph_debug_executor( | ||
hexagon_lowered.get_graph_json(), hexagon_lowered.lib | ||
) | ||
print(debug_ex.profile(input_name=inp.copy())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want benchmark
at L589 and profiling here, given that this test is not really tuning anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to run these to see what the performance of the model is in summation after we use all of these schedules. Is there a better way to get that?
sch.unroll(new_loops[-4]) | ||
# TODO(nverke): Add compute optimizations here. | ||
else: | ||
# Handle case where kernel is 1x1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you set preserve_unit_loops = True
in compute_at
, you can get rid off this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked! ✅
476948e
to
7980e3a
Compare
The changes to TIR broke the tensorization with this so those issues need to be fixed. For the time being rebased on top of another commit and made updates that were requested. |
Actually there was also a |
sch.parallel(new_loops[4]) | ||
sch.unroll(new_loops[5]) | ||
# TODO(nverke): Add compute optimizations here. | ||
sch.blockize(loop=oc_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't need this blockize
(tensorize
does blockize
anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason tensorization breaks when removing this...
10ace0b
to
c38809e
Compare
Was able to pull in the fix for vrmpy tensorization and rebase this onto mainline so should be ok to push now. |
…lines using metaschedule
42da1b3
to
183cbc8
Compare
apache#13352) * [Hexagon] Add test to show scheduling of resnet50 with async dma pipelines using metaschedule * lint
…lines using metaschedule
This test uses a schedule function to run each conv2d in resnet50 in a pipeline async dma copies to vtcm and parallel HVX.