Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

rkimball · 2021-02-09T17:46:55Z

Added pass AnnotateDevicePlacement which allows for per-Call op placement using on_device. This allows for heterogeneous execution of non-external compilers such as CPU and GPU.
Added unit test for AnnotateDevicePlacement.
Moved the on_device definition from an anonymous lambda to a c++ function so that it can be called directly from c++ instead of forcing the use of PackedFunction calls.

execution. Add unit test for pass.

comaniac · 2021-02-09T18:18:00Z

Thanks for the PR. Regard to the current AnnotateTarget pass for BYOC, I have some questions/comments:

From the use case, it looks like very similar to the compiler_begin and compiler_end, but limits to per node. Could you ellaborate the reason why not using compiler_begin and compiler_end directly?
This PR doesn't include a pass to annotate nodes but relies on manual annotation. Could you ellaborate how would you expect users to use them?
If we use compiler_begin and compiler_end, it seems possible to extend existing AnnotateTarget to annotate non-external compilers. The current AnnotateTarget annotates rest nodes to CPU. We could refactor that pass and make CPU and GPU as two "external" compilers as well so that AnnotateTarget could also take care of them. I feel that we should only maintain one annotation mechanism at the end, so whatever which approach we finally agree on, we could also refactor BYOC AnnotateTarget pass in that way.

manupak · 2021-02-12T07:36:34Z

I also have the same concern as @comaniac. Maybe we should slowly stop using the word "BYOC", it makes it sound "external".

I think we can have a unified infra to dispatch IRModule --> runtime.Module compilation flows. I don't see an immediate concern why we cant use the current "BYOC" partitioning infra to generate the cpu, gpu targets as well.

Love to hear thoughts on this and see the roadblocks of unifying these approaches. I guess that helps to have a unfied TargetRegistry as well.

cc : @tqchen

tmoreau89 · 2021-02-18T21:50:56Z

Thanks for the comments and input everyone, @zhiics any feedback on the annotation pass? Thanks!

rkimball · 2021-02-19T17:26:26Z

@comaniac and @manupa-arm to your questions
This PR is to provide a simple demo of heterogeneous execution on CPU and Vulkan which are both tvm internal compilers.

I started with compiler_begin and compiler_end and was able to annotate my simple example model. These attributes worked great until up until calling the compilers. In my use case I need to compile code to use the CPU and Vulkan backends, both of which are built into tvm. With the compiler_begin/end annotation it really looks like it only works with external compilers, with no way to use tvm's built in compilers for the second device. For a quick demo I used the older on_device annotation which does directly work with both backends built into tvm.
There is a pass to annotate nodes, AnnotateDevicePlacement in src/relay/transforms/annotate_device_placement.cc. The pass simply call a callback for each CallNode and then annotates the nodes with the returned on_device. This pass is sufficient for the demo I had put together. What the user specifically wants to do is still under investigation.
As a more long-term goal I agree with you that consolidating the two heterogeneous approaches into a single approach is desired and hope to do that. AnnotateTarget looks to be complete. I though about wrapping CPU and GPU as "external" compilers but that seems like a more roundabout solution to the problem, better would be to unify the "external" and "internal" compilers so that either could be used with the compile_begin/end annotation.

comaniac · 2021-02-19T17:57:05Z

@rkimball Exactly. We should eliminate "external" and "internal" but just treat every target evenly. Meanwhile, the corresponding changes in compile engine have to be made to deal with built in targets, as you pointed out.

While I still feel we should not have two mechanisms doing similar things, I'd suggest having an RFC to improve the BYOC flow so that it could be even more general.

cc @zhiics

jroesch · 2022-01-19T20:10:33Z

This PR appears to be out of date, please feel free to reopen it if this is not the case.

As part of the new year we are attempting to triage the project's open pull requests to ensure that code which
is ready for review and/or merging receives adequate attention.

Thanks again for your contribution, and feel free to reach out to discuss these changes.

Add pass to annotate ops with on_device for non-BYOC heterogeneous

870faa5

execution. Add unit test for pass.

manupak mentioned this pull request Feb 15, 2021

[CRT] Create C-runtime-style metadata module for llvm builds #7398

Merged

Fix warning

14cbc81

tqchen assigned comaniac Mar 3, 2021

tqchen added the status: need RFC need RFC discussion label Mar 3, 2021

manupak mentioned this pull request Oct 1, 2021

[Lowering] Add TargetDevice composite data structure. #8892

Closed

jroesch closed this Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

rkimball commented Feb 9, 2021

comaniac commented Feb 9, 2021

manupak commented Feb 12, 2021

tmoreau89 commented Feb 18, 2021

rkimball commented Feb 19, 2021

comaniac commented Feb 19, 2021

jroesch commented Jan 19, 2022

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

Conversation

rkimball commented Feb 9, 2021

comaniac commented Feb 9, 2021

manupak commented Feb 12, 2021

tmoreau89 commented Feb 18, 2021

rkimball commented Feb 19, 2021

comaniac commented Feb 19, 2021

jroesch commented Jan 19, 2022