Heterogeneous Runtime #1695

zhiics · 2018-09-07T18:18:15Z

This is the first part of the PR #1688.

This PR only focuses on making the runtime be able to take heterogeneous graphs. Changes are mainly made for graph runtime c++ and python interfaces. Meanwhile, to test the execution, I manually created two simple graphs containing only addition, subtraction, and copy nodes. One test, test_simplex_data_transferring tests data transferring from GPU to CPU at runtime. The other one, test_duplex_data_transferring tests duplex data transferring back-and-forth between GPU and CPU.

A new column, device_type, is added to the json file, which indicates which device a node should be scheduled to. In this PR this column is manually created as part of a graph json file. This field will be also used to annotate the graph node in the compiler PR. The serialization of this column is similar to that of the dtype column in the current json file. Loading/Saveing json API need to be modified slightly to support this field.

Tested the functionality on a MacBook with Intel CPU and Intel Graphics GPU for using the generated module in memory and exporting/importing it from the disk.

The next PR will focus on the compiler part to generate heterogeneous binaries to feed in the runtime. Major changes will be needed for the compiler.build interface. Another issue is the removal of with target statements in the high-level build interface.

yzhliu · 2018-09-07T22:24:46Z

python/tvm/contrib/graph_runtime.py

+    # CPU is always used as the host processor. Its device type is 1 as
+    # defined in TVMContext and dlpack.h. The libmod_ctx is sorted according
+    # to the device type field in TVMContext. It is used to guarantee that the
+    # first lib and context in the array belong to CPU.


not good at the first glance. rely on the device type number is a little bit tricky, and is it really necessary to make host device the head of list? - given the fact that you pass host_ctx as an arg.

@yzhliu Thanks. Yeah, I am also aware of this. The host_ctx here is actually a bad name. It is actually the local context that deploys the module as used in graph_runtime.create(). I think it is not guaranteed to be the context for the host processors. I will change the name to local_ctx.

Another option was to search the dictionary and check cpu context. Then we can put cpu related lib/device_type/device_id as the first element in each list and append each field for the other devices, like:

for lib, ctx in libmod_ctx.items(): if (ctx == tvm.cpu(ctx.device_id)): libs/device_types/device_ids.insert(0, ...) else: libs/device_types/device_ids.append(...) if device_types[0] != STR2MASK["cpu"]: Raise....

I went for the sorting way because I think it is transparent to users and well documented. Actually, I can probably also add a check for the first element of device_types to make sure it is cpu.

yzhliu · 2018-09-07T22:26:41Z

src/runtime/c_runtime_api.cc

@@ -73,7 +52,7 @@ class DeviceAPIManager {
      if (api_[type] != nullptr) return api_[type];
      std::lock_guard<std::mutex> lock(mutex_);
      if (api_[type] != nullptr) return api_[type];
-      api_[type] = GetAPI(DeviceName(type), allow_missing);
+      api_[type] = GetAPI(tvm::runtime::DeviceName(type), allow_missing);


no need to change

yzhliu · 2018-09-07T22:40:13Z

src/runtime/graph/graph_runtime.cc

+  StorageDeviceMap sid_dev_map;
+  for (uint32_t nid = 0; nid < this->num_nodes(); ++nid) {
+    const auto &inode = nodes_[nid];
+    for (const auto &e : inode.inputs) {


minor issue, auto& var instead

yzhliu · 2018-09-07T22:59:41Z

src/runtime/graph/graph_runtime.cc

+    for (const auto &e : inode.inputs) {
+      uint32_t eid = this->entry_id(e);
+      uint32_t sid = attrs_.storage_id[eid];
+      sid_dev_map[sid] = nodes_[e.node_id].device;


CHECK(sid_dev_map.count(sid) == 0)?

@yzhliu I think you probably meant:
CHECK(sid_dev_map.count(sid) == 0 || sid_dev_map[sid] == nodes_[e.node_id].device) << "Cannot assign the storage id to multiple devices ", right?

yzhliu · 2018-09-07T23:02:47Z

src/runtime/graph/graph_runtime.cc

    size_t size = 1;
    for (int64_t sz : attrs_.shape[i]) {
      size *= static_cast<size_t>(sz);
    }
-    CHECK_GE(storage_id, 0) << "Do not support runtime shape op";
+    CHECK_GE(sid, 0) << "Do not support runtime shape op";


confusing, cast to uint and check >= 0?

yzhliu · 2018-09-07T23:09:36Z

src/runtime/graph/graph_runtime.cc

-    }
-    pool_entry_bytes[sid] = std::max(pool_entry_bytes[sid], bytes);
+    DLDeviceType dev_type = sid_dev_map[sid];
+    device_pool_entry_bytes[dev_type][sid] =


looks like there's no need to make dev_type a key. it can be achieved from sid_dev_map right? just feel better to keep data struct simple.

@yzhliu Good point. Thanks.

yzhliu · 2018-09-07T23:17:53Z

src/runtime/graph/graph_runtime.cc

+      DLTensor *tensor;
+      TVM_CCALL(TVMArrayAlloc(shape, 1, kDLFloat, 32, 1, ctx.device_type,
+                              ctx.device_id, &tensor));
+      device_storage_pool_[it.first][pit.first] = tensor;


same as above.

yzhliu · 2018-09-07T23:25:41Z

src/runtime/graph/graph_runtime.cc

@@ -482,27 +136,28 @@ void GraphRuntime::SetupStorage() {

 void GraphRuntime::SetupOpExecs() {
  op_execs_.resize(this->num_nodes());
+  std::vector<DLTensor> ids;


yzhliu · 2018-09-07T23:33:29Z

src/runtime/graph/graph_runtime.cc

+      runtime_host_module_.GetFunction(param.func_name, false);
+  if (pf == nullptr) {
+    for (const auto& it : runtime_device_mod_ctx_map_) {
+      pf = it.first->GetFunction(param.func_name, false);


what if two mod have functions of same name?

Thanks for pointing it out. This can be solved by using device information.

zhenhuaw-me · 2018-09-08T01:02:02Z

include/tvm/runtime/device_api.h

@@ -36,6 +36,39 @@ constexpr int kTempAllocaAlignment = 64;
 /*! \brief Maximum size that can be allocated on stack */
 constexpr int kMaxStackAlloca = 1024;

+/*! \brief The default device allocated to an operator */
+constexpr DLDeviceType kDLDefaultDevice = kDLCPU;


by this default , do you mean the fallback device?

@jackwish Yes.

this should not appear here, but instead we should pass fallback device as argument or put it in a context

tqchen · 2018-09-09T16:44:29Z

@srkreddy1238 @nishi-t @eqy please take a bit time to review this

zhenhuaw-me · 2018-09-10T01:40:52Z

python/tvm/contrib/graph_runtime.py

+        if ctx.device_type >= rpc_base.RPC_SESS_MASK:
+            raise RuntimeError(
+                "rpc is not supported for heterogeneous execution yet.")
+        if ctx == tvm.cpu(ctx.device_id):


In this code segment, are we assuming that, considering the subgraph of the whole graph, each target device will only be used by one subgraph at most?

@jackwish Not really. The "subgraph" is actually just a fused node here. For example, we annotate the graph and provide the context information for each node. During fusion, nodes with different context will not be grouped together. Instead, different ops marked with the same device/context will be compiled into the same binary.

zhenhuaw-me

Remarkable work!

btw, is type casting used a bit frequent?

tqchen · 2018-09-11T04:53:29Z

Some hight level comments:

Please rebase against the master and add changes on top of current graph runtime, so that the diff is more clear(currently it moves all implementation to the header and removes the previous changes of NDArray)
Please document the new context support format, specifically, what is the serialization meta-data is necessary (e.g. context_id)
Enhance the graph runtime create to simply create a runtime that passes a list of context, which gives the contexts that are necessary to assign each one. Maybe have a good cpu default
We don't need to pass in modules for different device separately, instead, we should pack them into a single module, if there is something that is not supported by TVM, we should add support to it

zhiics · 2018-09-11T05:29:36Z

@tqchen Sorry. Could you please elaborate more how we can pack them into a single module? It seems that tvm.build only takes one target. Or do you mean we need to pack the modules after they are generated for different devices? I think both of them are not clear to me.

tqchen · 2018-09-11T05:44:38Z

you might find some insight by checkout the implementation of the code here: https://github.com/dmlc/tvm/blob/master/python/tvm/build_module.py#L503

build can take in list of LoweredFunc and build a module, either a host one, or device one(with host module). What we really have to do, is to delay the generation of the host module and return it as a list of LoweredFunc. The device module can simply build already for each device. Then we can build a single host module and import the device module from there.

If you are building a cpu, gpu mixed runtime, things can even be simpler. Just put everything into a list of LoweredFunc, and build with gpu as the target. The cpu code will be compiled as host module

python/tvm/contrib/graph_runtime.py

zhiics · 2018-09-13T18:06:00Z

Thanks for @tqchen 's suggestion. I looked into tvm.build. I think there are probably two ways to generate just one module.

Modify tvm.build somehow so that we only need to call it once with all lowered functions. I think the most important thing in this solution is to know which target a func (https://github.com/dmlc/tvm/blob/master/python/tvm/build_module.py#L456) belongs to. It means we probably need have a target attribute in loweredfunc, and this attribute could be set when we call MakeAPI. It also means we probably want to pass target to tvm.lower when a lowered function is generated.
Call tvm.build multiple times, each time a different target is passed into this function. There would be multiple binaries depending on the number of devices as what we are currently doing. In the end, we can probably implement a combine_modules(device_modules, host_module) method which combines all the devices modules to the host_module using import_module. This way, we don't need any change to the current tvm.build, tvm.lower, and tvm c++ pass APIs, except having a simple combining function. But we need to change grpah_compile.cc slightly to pass individual targets to tvm.build multiple times in the future when we work on the compiler PR.
Update: This solution does have another problem because the current TVM doesn't support exporting hierarchical modules to binary files (https://github.com/dmlc/tvm/blob/master/src/codegen/codegen.cc#L40).

I quickly tested both methods locally and it seemed that both worked. I think both solutions are simple, but the second one needs fewer changes to the current code base. There might be some other solutions. @tqchen Do I miss something? or do you have any comments/advices? Thanks.

tqchen · 2018-09-15T22:03:36Z

One possible approach

Allow tvm.build returns (list_of_lowered_host_funcs, device_module), two ways:
- Add a flag delay_host_codegen, when set to True, return fhost, fdevice instead
- Attach a private field mhost._lowered_funcs to the host module.
call tvm.build on each device, to get the device modules,
Now we have device module for each target, and bunch of list of lowered funcs, combine the list of LoweredFunc into a single list, call code gen to get the host module
Import all the device module into that host module

zhiics · 2018-09-17T20:51:08Z

@tqchen Thanks for your suggestion. I think that postponing codegen should be able to solve the hierarchical module problem.

I would like to go for the approach of returning (fhost, device_module) because I don't want have loweredfunc as a member of the module class. I think it would be good to keep runtime and compilation separate. On the other hand, returning the (fhost, device_module) tuple also keeps code clean and simple.

I will update the PR soon to 1) move the definition of GraphRuntime back to cc file, 2) document the device column in the json file, and 3) clean the other files accordingly.

zhiics · 2018-09-19T20:39:05Z

@tqchen @yzhliu @srkreddy1238 Now we only have one graph_runtime interface for both C++ and Python. Currently we can pass either 4 or 5 arguments to graph_runtime.create in the C++ backend to keep support for Java and js because heterogeneous execution for them has not implemented yet. This is also documented in the code.

tqchen · 2018-09-20T00:50:06Z

python/tvm/build_module.py

+    # collected.
+    mdev = codegen.build_module(fdevice, str(target_device)) if fdevice else None
+    if postpone_host_codegen:
+        return mdev, fhost


return fhost, mdev host first, as device code can be none

tqchen · 2018-09-20T00:51:13Z

python/tvm/build_module.py

        mhost.import_module(mdev)
    return mhost
+
+def combine_modules(host_funcs, device_modules, target_host=None):


I feel we can directly call codegen.build_module and do not need this additional level of abstraction for now, move the comment to the place that actually calls combine_mmodules

@tqchen Yes, we can do that. The reason that I have this function here is because we will need it anyway in the compiler PR. I called this function in the unit test. I can removed it for now and add it back later.

src/runtime/graph/graph_runtime.cc

tqchen · 2018-09-20T00:56:06Z

python/tvm/contrib/graph_runtime.py

+            "CPU should be the host processor for heterogenous execution, but"
+            " not found in ctx.")
+
+    device_type_arr = (ctypes.c_int * num_devices)(*device_types)


We should not pass in raw integer pointers into a function, it will not be RPC compatible. Instead, we can pass things in positionally, simply by

fcreate(json, libmod, ndevices, device_type0, device_id0, device_type1, device_id1); ...

@tqchen I am aware of this, we can pass fcreat(json, libmod, ctx[0], ctx[1]). But it seems to me that we need to check the number of context although we usually we have at most 2 or 3 context.

if len(ctx) == 1: fcreate(json, libmod, ctx[0]) elif len(ctx) == 2: fcreate(json, libmod, ctx[0], ctx[1])

Do I miss something?

in python you can just do fcreate(json, libmod, *ctx)

@tqchen btw, passing tvmcontext seems only working for python, not java and js, right? If so, we probably still need to use the way to pass (Json, mod, num_dev, dev_types, dev_ids).

tqchen · 2018-09-20T00:59:00Z

src/runtime/graph/graph_runtime.cc

@@ -277,10 +297,16 @@ class GraphRuntime : public ModuleNode {
          this->LoadAttrs(reader, &param);
        } else if (key == "control_deps") {
          reader->Read(&control_deps);
+        } else if (key == "device_type") {


device_index? as we are doing virtual device_index to device type mapping

tqchen · 2018-09-20T01:00:09Z

src/runtime/graph/graph_runtime.cc


 namespace tvm {
 namespace runtime {
+using StorageDeviceMap = std::unordered_map<uint32_t, DLDeviceType>;
+using DeviceStoragePoolMap = std::unordered_map<size_t, NDArray>;
+using ModuleContextMap = std::unordered_map<tvm::runtime::Module*, TVMContext>;


remove module context map

tqchen · 2018-09-20T01:01:33Z

src/runtime/graph/graph_runtime.cc

+    // to make sure homogeneous execution works correctly. It will be removed
+    // when we add the compiler pass as we will read the serialized value from
+    // json for all execution.
+    DLDeviceType device_type{kDLCPU};


Let us not put device_type here, instead, introduce a device_index column attribute, just like storage_id to indicate device assignment

If the column do not exist, fallback to default(0 primary context passed in)

@tqchen So we use ctxs_[0] as the default, right?

I think that is a reasonable choice

tqchen · 2018-09-20T01:02:45Z

src/runtime/graph/graph_runtime.cc

+  /*! \brief Execution context of all devices including the host. */
+  std::vector<TVMContext> ctxs_;
+  /*! \brief Common storage pool for each device. */
+  DeviceStoragePoolMap device_storage_pool_;


It is likely we can still just use vector for storage_pool, as long as the storage index sharing algorithm is device aware.

@tqchen I am not sure. I thought the storage index sharing algorithm was not device aware because we don't have multiple devices.

That could be true, but we can still introduce post-processing to make them aware of device, the general principle is that we want the runtime to be as dumb as possible and let compiler do most of the jobs

@tqchen Thanks. I agree we should keep runtime minimum. We can remove all maps if a storage id is guaranteed to be only assigned to one device. I can also use a vector to represent the sid to device_type mapping and use sid for indexing. I think we need this mapping to help memory allocation on the correct device more conveniently, right?

tqchen · 2018-09-20T01:03:39Z

src/runtime/graph/graph_runtime.cc

-  for (size_t i = 0; i < pool_entry_bytes.size(); ++i) {
+
+  // Allocate the space on each device.
+  for (const auto& pit : device_pool_entry_bytes) {


Likely we do not need to do it per device. We can still do it by each entries in the pool, but when we try to allocate the memory in the pool, we look up which device that entry belongs to

@tqchen But the number of iterations would be still the same, right? Going by device seems more intuitive to me.

I agree going by device is more intuitive, but going by pool likely removes additional data structure we need( we only need a vector pool) and vector of the arrays. As per last comment, one of goal of runtime is to keep it as simple as possible

python/tvm/contrib/graph_runtime.py

tqchen · 2018-09-21T00:08:39Z

python/tvm/contrib/graph_runtime.py

    fcreate = get_global_func("tvm.graph_runtime.create")
-    return GraphModule(fcreate(graph_json_str, libmod, device_type, device_id), ctx)
+    return GraphModule(fcreate(graph_json_str, libmod, num_devices,


just pas in ctx[0]

@tqchen So we actually pass fcreate(json, libmod, device_type_id[0], device_type_id[1], *device_type_id_others), right?

yah, that can be a solution

tqchen · 2018-09-21T00:08:54Z

python/tvm/contrib/graph_runtime.py

+
+    # Assume CPU is the host processor when there are multiple devices on
+    # a hardware platform.
+    if (num_devices > 1) and (cpu_ctx_index < 0):


avoid doing context check for now and just use ctx[0] as primary context

tqchen · 2018-09-21T00:10:53Z

src/runtime/graph/graph_runtime.cc

+    // This for loop is very fast since there are usually only a couple of
+    // devices available on the same hardware.
+    for (const auto& cit : ctxs_) {
+      if (pool_entry[i].device_type == static_cast<int>(cit.device_type)) {


we can use std::find_if

tqchen · 2018-09-21T00:12:05Z

src/runtime/graph/graph_runtime.cc

@@ -508,8 +554,9 @@ void GraphRuntime::SetupOpExecs() {
      uint32_t eid = this->entry_id(nid, index);
      args.push_back(*(data_entry_[eid].operator->()));
    }
-    CHECK_EQ(inode.op_type, "tvm_op")
-        << "Can only take tvm_op as op";
+    CHECK(inode.op_type == "tvm_op" || inode.op_type == "device_copy_op")


since we are already using __copy for cross-device copy, we just need to make sure op_type is "tvm_op"

tqchen · 2018-09-21T00:14:01Z

src/runtime/graph/graph_runtime.cc

+  std::vector<TVMContext> ret(1);
+  if (args.num_args == 4) {
+    int dev_type = args[2];
+    ret[0].device_type = static_cast<DLDeviceType>(dev_type);


just do push_back so there is no special logic getting involved

tqchen · 2018-09-21T00:14:39Z

src/runtime/graph/graph_runtime.cc

+      int dev_type = args[3 + i];
+      ctx.device_type = static_cast<DLDeviceType>(dev_type);
+      ctx.device_id = args[3 + i + 1];
+      if (ctx.device_type == static_cast<int>(kDLCPU)) {


do not do magic like this, let just just push things and build up the ctx

src/runtime/graph/graph_runtime.cc

python/tvm/contrib/graph_runtime.py

tqchen · 2018-09-21T04:20:17Z

@zhiics some final followup comments. @jackwish @srkreddy1238 @yzhliu @tmoreau89 please take a round of review and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

tmoreau89 · 2018-09-21T06:26:29Z

python/tvm/build_module.py

-    """Build a function with arguments as signiture.
+          binds=None,
+          postpone_host_codegen=False):
+    """Build a function with arguments as signiture. Code will be generated


signiture -> signature

tmoreau89 · 2018-09-21T06:38:49Z

tests/python/unittest/test_runtime_heterogeneous.py

+
+def get_simplex_graph(host_dev_type, device_dev_type):
+    r""" Return the hand-crafted json object where only one copy node is
+    inserted. Tis node copies data from the target device to cpu.


(I assume it's probably a typo on "the")

@tmoreau89 Thanks. It was a typo. I was trying to say "This".

tmoreau89

Overall, excellent work! Thank you for providing well written examples. This will open a lot of interesting work on heterogeneous execution on CPU+GPU or CPU+FPGA systems.

tests/python/unittest/test_runtime_heterogeneous.py

tqchen · 2018-09-21T16:17:54Z

@zhiics please address the review comments

tqchen

some followup comments

tqchen · 2018-09-21T18:00:16Z

python/tvm/contrib/graph_runtime.py

+            raise ValueError("ctx has to be the type of TVMContext or a list "
+                             "of TVMContext")
+        if cur_ctx.device_type >= rpc_base.RPC_SESS_MASK:
+            ctx[0], ctx[i] = ctx[i], ctx[0]


what is the purpose of this swapping? can we just remove it?

@tqchen Sorry. I think I misunderstood RPC here. I thought we just need one of them to be remote. So I put it as the first one. I updated it. Please take another look and see if it makes sense. Thanks.

tqchen · 2018-09-21T18:01:24Z

python/tvm/contrib/graph_runtime.py

-        device_type = device_type % rpc_base.RPC_SESS_MASK
-        return GraphModule(fcreate(graph_json_str, hmod, device_type, device_id), ctx)
+        fcreate = ctx[0]._rpc_sess.get_function("tvm.graph_runtime.remote_create")
+        device_type = ctx[0].device_type % rpc_base.RPC_SESS_MASK


We need to do the RPC session stripping for all the context.

Need to assert that all the context are remote and belongs to the same session

tqchen · 2018-09-21T18:02:30Z

python/tvm/contrib/graph_runtime.py

+
+    # ctx[0] is used as the primary/fallback context. All other ones are used
+    # as device context for heterogeneous execution.
+    device_type_id = [x for c in ctx[1:] for x in [c.device_type, c.device_id]]


consider use for loop to populates this, since we need to strip of rpc sess mask from all of them, maybe some of the logic need to be put into loops

tqchen · 2018-09-22T00:43:49Z

python/tvm/contrib/graph_runtime.py


    fcreate = get_global_func("tvm.graph_runtime.create")
-    return GraphModule(fcreate(graph_json_str, libmod, ctx[0].device_type,
-                               ctx[0].device_id, *device_type_id))
+    return GraphModule(fcreate(graph_json_str, libmod, device_type_id[0],


can just do *device_type_id

combine modules for heterogeneous execution

tqchen · 2018-09-22T03:32:52Z

Thanks @zhiics @yzhliu @jackwish @tmoreau89 , this is merged

zhiics mentioned this pull request Sep 7, 2018

[WIP] Heterogeneous execution of TVM #1688

Closed

yzhliu reviewed Sep 7, 2018

View reviewed changes

yzhliu added the status: review in progress label Sep 7, 2018

zhenhuaw-me reviewed Sep 8, 2018

View reviewed changes

tqchen self-assigned this Sep 9, 2018

zhenhuaw-me reviewed Sep 10, 2018

View reviewed changes

srkreddy1238 requested changes Sep 11, 2018

View reviewed changes

python/tvm/contrib/graph_runtime.py Outdated Show resolved Hide resolved

zhiics force-pushed the runtime branch 3 times, most recently from b257811 to 63e3519 Compare September 19, 2018 16:36

zhiics force-pushed the runtime branch from 8eca88c to 02a1b30 Compare September 19, 2018 21:38

tqchen requested changes Sep 20, 2018

View reviewed changes

zhiics force-pushed the runtime branch from 02a1b30 to 035daf0 Compare September 20, 2018 20:08

tqchen requested changes Sep 21, 2018

View reviewed changes

zhiics force-pushed the runtime branch from 8d66a2f to 02cf73b Compare September 21, 2018 03:14

tqchen requested changes Sep 21, 2018

View reviewed changes

python/tvm/contrib/graph_runtime.py Outdated Show resolved Hide resolved

python/tvm/contrib/graph_runtime.py Outdated Show resolved Hide resolved

tqchen added status: need update need update based on feedbacks and removed status: review in progress labels Sep 21, 2018

tqchen mentioned this pull request Sep 21, 2018

Graph Runtime for Heterogeneous Execution #1242

Closed

tmoreau89 reviewed Sep 21, 2018

View reviewed changes

tests/python/unittest/test_runtime_heterogeneous.py Show resolved Hide resolved

zhiics force-pushed the runtime branch 2 times, most recently from 9683264 to f12e36e Compare September 21, 2018 16:36

tqchen requested changes Sep 21, 2018

View reviewed changes

tqchen requested changes Sep 22, 2018

View reviewed changes

zhiics added 10 commits September 21, 2018 19:48

Rebase to upstream master

0496e1d

combine modules for heterogeneous execution

fix lint

0d0b419

reserve the functionality of calling graph_time.create from java and js

f0ef89d

default device_type to CPU for transition purpose

6f02aa1

fix description for tvm.build

ec979b7

remove map structures

a619cf4

fix python error

51a6e3e

Remove ctx from GraphModule and fix typo

c734697

fix rpc

66b208f

pass *device_type_id

143341d

zhiics force-pushed the runtime branch from 93c1748 to 143341d Compare September 22, 2018 02:49

tqchen approved these changes Sep 22, 2018

View reviewed changes

tqchen added status: accepted and removed status: need update need update based on feedbacks labels Sep 22, 2018

tqchen merged commit 7c3ec7d into apache:master Sep 22, 2018

zhiics deleted the runtime branch September 22, 2018 03:33

zhiics restored the runtime branch September 22, 2018 03:33

zhiics deleted the runtime branch September 22, 2018 15:48

tqchen mentioned this pull request Nov 17, 2018

[RFC] Discuss New Features of AOT Runtime #2122

Closed

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018

Heterogeneous Runtime (apache#1695)

6d44dbd

ZihengJiang mentioned this pull request Feb 2, 2019

TVM 0.5 Release Note #2448

Closed

wweic mentioned this pull request Oct 22, 2019

[RFC][VM] Heterogeneous execution in Relay VM #4178

Closed

4 tasks

Heterogeneous Runtime #1695

Heterogeneous Runtime #1695

Conversation

zhiics commented Sep 7, 2018 • edited Loading

Choose a reason for hiding this comment

zhiics Sep 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics Sep 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Sep 9, 2018

Choose a reason for hiding this comment

zhiics Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

zhenhuaw-me left a comment

Choose a reason for hiding this comment

tqchen commented Sep 11, 2018 • edited Loading

zhiics commented Sep 11, 2018

tqchen commented Sep 11, 2018

zhiics commented Sep 13, 2018 • edited Loading

tqchen commented Sep 15, 2018

zhiics commented Sep 17, 2018 • edited Loading

zhiics commented Sep 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics Sep 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics Sep 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics Sep 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Sep 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmoreau89 left a comment

Choose a reason for hiding this comment

tqchen commented Sep 21, 2018

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics commented Sep 7, 2018 •

edited

Loading

zhiics Sep 8, 2018 •

edited

Loading

zhiics Sep 8, 2018 •

edited

Loading

zhiics Sep 10, 2018 •

edited

Loading

tqchen commented Sep 11, 2018 •

edited

Loading

zhiics commented Sep 13, 2018 •

edited

Loading

zhiics commented Sep 17, 2018 •

edited

Loading

zhiics Sep 20, 2018 •

edited

Loading

zhiics Sep 20, 2018 •

edited

Loading

zhiics Sep 20, 2018 •

edited

Loading