[RFC][VM] Heterogeneous execution in Relay VM #4178

wweic · 2019-10-22T21:42:24Z

Heterogeneous execution in Relay VM

Goal

Relay graph runtime supports executing different parts of the graph in various devices, namely heterogeneous execution. We’d like to port the feature to Relay VM.

Non-goals

There is a limitation of device annotation pass that it assumes all the computation happens inside a single function, so it’s not able to compute the device assignment of multiple relay functions. It might be an issue that we allocate GPU tensor in the main function, but calls out to a tensor array concatenate operation which is another relay function, it might crash or copy to CPU memory(I haven’t experimented yet). A proper way to fix this is implement interprocedural analysis for the device annotation pass.

Current Design in Relay Graph Runtime

Compilation

Reference: #2361

Summary: If users want to specify a device for an operator to run on, they can use an annotation operator named on_device(expr, dev_id) to wrap an expression. At a step RunDeviceAnnotationPass during relay.build, we will replace on_device node with device_copy node. At the step of PasGraphPlanMemory , we compute the device assignment(device_type see next section) of each memory block. This is possible because graph runtime only support static graph, so we can capture all the information statically. Then during native code generation, device_copy node is mapped to special packed function named __copy.

Runtime

Reference: #1695

Summary: In the graph json file, a new field named device_type specifies which device a static memory node should be scheduled to, the runtime allocates the memory in on the device accordingly. When graph runtime sees special operator named __copy, it calls TVMArrayCopyFromTo to move memory across devices correctly.

Proposal for Relay VM

Compilation

References:

Add AllocStorage opcode which allocates physical memory. ([Relay][Memory][VM] #3560)

We should be able to reuse all the workflow up until RunDeviceAnnotationPass. VM compiler which translate relay expression into vm opcodes needs to map device_copy node into an opcode named DeviceCopy(src_register, dst_register). The tensor object in each register should have the device context so vm knows how to copy the data. We need to change AllocTensor(later AllocStorage) as well, we need to attach the device context to the instruction so we know where to allocate the memory, right now we just use the default context.

VM Runtime

VM needs to implement the changes to AllocTensor and DeviceCopy.

Tasks

Add opcode DeviceCopy.
Add device context to AllocTensor/AllocStorage.
Change VMCompiler to attach device context to AllocTensor/AllocStorage.
Change VMCompiler to emit DeviceCopy opcode.

cc @icemelon9 @zhiics @zxy844288792 @jroesch @tqchen @yzhliu

The text was updated successfully, but these errors were encountered:

jroesch · 2019-10-24T05:14:38Z

I think if we look at my recent PR we need to probably track the device context when we allocate storage. The storage's context will prevent merging different pieces of storage.

wweic · 2019-10-24T06:46:09Z

@jroesch thanks. I have put references to the PR in the RFC.

zxy844288792 · 2019-10-24T22:06:57Z

I'm interested in this. @wweic I'll talk to you for advice.

tqchen · 2020-10-08T15:48:32Z

@zhiics @wweic would be great to get a status update and see if this PR can be updated or closed

zhiics · 2020-10-08T15:57:19Z

ahh, thanks for reminding. This is closed by #6337

ZihengJiang mentioned this issue Aug 27, 2020

[VTA][OpenCL] add device_annot support in graphpack #6125

Merged

zhiics closed this as completed Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][VM] Heterogeneous execution in Relay VM #4178

[RFC][VM] Heterogeneous execution in Relay VM #4178

wweic commented Oct 22, 2019 •

edited

Loading

jroesch commented Oct 24, 2019

wweic commented Oct 24, 2019

zxy844288792 commented Oct 24, 2019

tqchen commented Oct 8, 2020

zhiics commented Oct 8, 2020

[RFC][VM] Heterogeneous execution in Relay VM #4178

[RFC][VM] Heterogeneous execution in Relay VM #4178

Comments

wweic commented Oct 22, 2019 • edited Loading

Heterogeneous execution in Relay VM

Goal

Non-goals

Current Design in Relay Graph Runtime

Compilation

Runtime

Proposal for Relay VM

Compilation

VM Runtime

Tasks

jroesch commented Oct 24, 2019

wweic commented Oct 24, 2019

zxy844288792 commented Oct 24, 2019

tqchen commented Oct 8, 2020

zhiics commented Oct 8, 2020

wweic commented Oct 22, 2019 •

edited

Loading