Skip to content

[Hexagon] Add test to show scheduling of resnet50 with async dma pipe… #13352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 18, 2022

Conversation

nverke
Copy link
Contributor

@nverke nverke commented Nov 10, 2022

…lines using metaschedule

This test uses a schedule function to run each conv2d in resnet50 in a pipeline async dma copies to vtcm and parallel HVX.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 10, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

return tvm_build(mod, target=target)
else:
mod = RemoveWeightLayoutRewriteBlock(skip_ndarray_rewrite=True)(mod)
return tvm_build(mod, target=target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use meta_schdule.builder.default_build here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Changed this to use the old strategy if pass context is not present.

def dot_product_32x4_u8i8i32_vtcm_desc(
A: T.Buffer((4,), "uint8", offset_factor=1, scope="global.vtcm"),
B: T.Buffer((32, 4), "int8", offset_factor=1, scope="global.vtcm"),
C: T.Buffer((32,), "int32", offset_factor=1, scope="global.vtcm"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to parametrize scope to remove duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! ✅



@tvm.testing.requires_hexagon
def test_async_dma_resnet50(hexagon_launcher):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to remove dups with test_packed_8x8x32_resnet50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to unify a good amount of the code ✅

debug_ex = session.get_graph_debug_executor(
hexagon_lowered.get_graph_json(), hexagon_lowered.lib
)
print(debug_ex.profile(input_name=inp.copy()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want benchmark at L589 and profiling here, given that this test is not really tuning anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to run these to see what the performance of the model is in summation after we use all of these schedules. Is there a better way to get that?

sch.unroll(new_loops[-4])
# TODO(nverke): Add compute optimizations here.
else:
# Handle case where kernel is 1x1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you set preserve_unit_loops = True in compute_at, you can get rid off this branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked! ✅

@nverke nverke force-pushed the async_dma_resnet50 branch from 476948e to 7980e3a Compare November 15, 2022 21:39
@nverke
Copy link
Contributor Author

nverke commented Nov 15, 2022

The changes to TIR broke the tensorization with this so those issues need to be fixed. For the time being rebased on top of another commit and made updates that were requested.

@masahi
Copy link
Member

masahi commented Nov 16, 2022

Actually there was also a vrmpy tensorization bug in the current main, fixed by #13404. I forgot the fact that we don't need an explicit initialization block in a tensorize description.

sch.parallel(new_loops[4])
sch.unroll(new_loops[5])
# TODO(nverke): Add compute optimizations here.
sch.blockize(loop=oc_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't need this blockize (tensorize does blockize anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason tensorization breaks when removing this...

@nverke nverke force-pushed the async_dma_resnet50 branch from 10ace0b to c38809e Compare November 17, 2022 22:40
@nverke
Copy link
Contributor Author

nverke commented Nov 17, 2022

Was able to pull in the fix for vrmpy tensorization and rebase this onto mainline so should be ok to push now.

@nverke nverke force-pushed the async_dma_resnet50 branch from 42da1b3 to 183cbc8 Compare November 17, 2022 22:57
@masahi masahi merged commit b29ab5c into apache:main Nov 18, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
apache#13352)

* [Hexagon] Add test to show scheduling of resnet50 with async dma pipelines using metaschedule

* lint
@nverke nverke deleted the async_dma_resnet50 branch December 2, 2022 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants