Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

Closed
wants to merge 2 commits into from
Closed

Add pass to annotate ops with on_device for non-BYOC heterogeneous #7428

wants to merge 2 commits into from

Conversation

rkimball
Copy link
Contributor

@rkimball rkimball commented Feb 9, 2021

  • Added pass AnnotateDevicePlacement which allows for per-Call op placement using on_device. This allows for heterogeneous execution of non-external compilers such as CPU and GPU.
  • Added unit test for AnnotateDevicePlacement.
  • Moved the on_device definition from an anonymous lambda to a c++ function so that it can be called directly from c++ instead of forcing the use of PackedFunction calls.

@comaniac
Copy link
Contributor

comaniac commented Feb 9, 2021

Thanks for the PR. Regard to the current AnnotateTarget pass for BYOC, I have some questions/comments:

  • From the use case, it looks like very similar to the compiler_begin and compiler_end, but limits to per node. Could you ellaborate the reason why not using compiler_begin and compiler_end directly?

  • This PR doesn't include a pass to annotate nodes but relies on manual annotation. Could you ellaborate how would you expect users to use them?

  • If we use compiler_begin and compiler_end, it seems possible to extend existing AnnotateTarget to annotate non-external compilers. The current AnnotateTarget annotates rest nodes to CPU. We could refactor that pass and make CPU and GPU as two "external" compilers as well so that AnnotateTarget could also take care of them. I feel that we should only maintain one annotation mechanism at the end, so whatever which approach we finally agree on, we could also refactor BYOC AnnotateTarget pass in that way.

Also cc @zhiics

@manupak
Copy link
Contributor

manupak commented Feb 12, 2021

I also have the same concern as @comaniac. Maybe we should slowly stop using the word "BYOC", it makes it sound "external".

I think we can have a unified infra to dispatch IRModule --> runtime.Module compilation flows. I don't see an immediate concern why we cant use the current "BYOC" partitioning infra to generate the cpu, gpu targets as well.

Love to hear thoughts on this and see the roadblocks of unifying these approaches. I guess that helps to have a unfied TargetRegistry as well.

cc : @tqchen

@tmoreau89
Copy link
Contributor

Thanks for the comments and input everyone, @zhiics any feedback on the annotation pass? Thanks!

@rkimball
Copy link
Contributor Author

@comaniac and @manupa-arm to your questions
This PR is to provide a simple demo of heterogeneous execution on CPU and Vulkan which are both tvm internal compilers.

  • I started with compiler_begin and compiler_end and was able to annotate my simple example model. These attributes worked great until up until calling the compilers. In my use case I need to compile code to use the CPU and Vulkan backends, both of which are built into tvm. With the compiler_begin/end annotation it really looks like it only works with external compilers, with no way to use tvm's built in compilers for the second device. For a quick demo I used the older on_device annotation which does directly work with both backends built into tvm.
  • There is a pass to annotate nodes, AnnotateDevicePlacement in src/relay/transforms/annotate_device_placement.cc. The pass simply call a callback for each CallNode and then annotates the nodes with the returned on_device. This pass is sufficient for the demo I had put together. What the user specifically wants to do is still under investigation.
  • As a more long-term goal I agree with you that consolidating the two heterogeneous approaches into a single approach is desired and hope to do that. AnnotateTarget looks to be complete. I though about wrapping CPU and GPU as "external" compilers but that seems like a more roundabout solution to the problem, better would be to unify the "external" and "internal" compilers so that either could be used with the compile_begin/end annotation.

@comaniac
Copy link
Contributor

@rkimball Exactly. We should eliminate "external" and "internal" but just treat every target evenly. Meanwhile, the corresponding changes in compile engine have to be made to deal with built in targets, as you pointed out.

While I still feel we should not have two mechanisms doing similar things, I'd suggest having an RFC to improve the BYOC flow so that it could be even more general.

cc @zhiics

@jroesch
Copy link
Member

jroesch commented Jan 19, 2022

This PR appears to be out of date, please feel free to reopen it if this is not the case.

As part of the new year we are attempting to triage the project's open pull requests to ensure that code which
is ready for review and/or merging receives adequate attention.

Thanks again for your contribution, and feel free to reach out to discuss these changes.

@jroesch jroesch closed this Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need RFC need RFC discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants