Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] JSON Runtime with DNNL End-to-End Flow #5919

Merged
merged 30 commits into from
Jul 10, 2020
Merged

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented Jun 24, 2020

RFC discussion: https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579

Currently, BYOC allows developers to choose either C source module or their customized module as the runtime for their accelerators. While we have provided an end-to-end execution flow of DNNL (i.e., MKL-DNN, OneDNN) using C source module, we found that many developers prefer to use a customized module to better integrate to their own runtime engine, such as TensorRT. As a result, this PR (collaborating with @zhiics) provides an end-to-end flow of DNNL using JSON runtime. Some detail highlights:

  • We provide JSON codegen and JSON runtime base classes. JSON codegen serializes a Relay subgaph to a JSON file; while JSON runtime base provides deserialization methods to interprete subgraphs in JSON format. Developers can derive JSON codegen to easily customize their codegen, or even directly use JSON codegen if their runtime engine accepts standard TVM graph runtime JSON.

  • We make a case study of leveraging JSON runtime with DNNL. The DNNL JSON runtime now supports conv2d, dense, relu, batch_norm, and add. As a result, it is able to run MobileNet. Note that DNNL JSON runtime only creates one DNNL execution engine for a subgraph, so it is much more efficient compared to the C source module version, which creates a DNNL engine for each operator in a subgraph.

  • DNNL JSON runtime handles constant tensors following the new mechanism in [RUNTIME] Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770.

  • DNNL codegen with C source module will be preserved for illustraction purpose, and we use cmake to control which DNNL codegen should be used. Specifically, USE_DNNL_CODEGEN ON and USE_DNNL_CODEGEN JSON enable the JSON runtime (and this is the default runtime for DNNL). When following the tutorial, which we will update after this PR, users may use USE_DNNL_CODEGEN C_SRC to enable C source module so that they can learn how it work.

Evaluation

This PR doesn't push the inference performance with the DNNL codegen/runtime. While we leave the issues as the future work, here we list some performance issues we have observed.

  1. write_to_dnnl_memory performs memory copy from DLTensor (NDArray) to DNNL memory. This can be avoided by specifying NDArray pointer when creating a DNNL memory. (~5 ms overhead to MobileNet V2).

  2. We should set OMP_NUM_THREADS wisely. For example, MobileNet V2 with batch_norm simplified achieves 1400 ms on c5.2xlarge. However, if we run it with OMP_NUM_THREADS=16, then the latency dropped to 65 ms.

  • Baseline: 1400 ms.
  • With OMP_NUM_THREADS=16: 65 ms.
  1. We should not assign the output layout to conv2d. The default output layout of DNNL conv2d is NHWC. If we assign the output layout to be NCHW, then it will perform layout transform after the computation. Instead, we should let DNNL use the layout it prefers, and only assign the layout to the last operator.
  • Baseline: 1400 ms.
  • Unassigned layout: 131 ms.
  • Unassigned layout with OMP_NUM_THREADS=16: 16 ms.

In summary, if we resolve all issue mentioned above, the inference performance of MoibleNet V2 on c5.2xlarge should be 16 - 5 =11 ms.

cc @masahi @mbaret @tqchen

@masahi
Copy link
Member

masahi commented Jun 25, 2020

maybe we should enable dnnl on CI?

@zhiics
Copy link
Member

zhiics commented Jun 25, 2020

yeah, we should. And we should remove the json_runtime_example.

src/runtime/metadata_module.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/codegen_json/codegen_json.h Outdated Show resolved Hide resolved
tests/python/relay/test_json_runtime.py Show resolved Hide resolved
src/runtime/contrib/json/json_runtime.h Outdated Show resolved Hide resolved
src/runtime/contrib/json/json_runtime.h Show resolved Hide resolved
src/runtime/contrib/json/json_runtime.h Outdated Show resolved Hide resolved
src/runtime/contrib/json/json_runtime.h Outdated Show resolved Hide resolved
src/relay/backend/graph_runtime_codegen.cc Show resolved Hide resolved
@lhutton1
Copy link
Contributor

lhutton1 commented Jun 29, 2020

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

@comaniac
Copy link
Contributor Author

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

We now have only one subgraph per module, but we could have many modules to support multiple subgraphs. Please see @mbaret 's comments to this PR and the discussions for details.

@lhutton1
Copy link
Contributor

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

We now have only one subgraph per module, but we could have many modules to support multiple subgraphs. Please see @mbaret 's comments to this PR and the discussions for details.

Apologies, missed that, thanks

Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but please wait on @lhutton1's approval to confirm this is usable with ACL.

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working well so far, thanks! I think the api to add additional attributes and retrieve them from a json node is a bit convoluted but I this could always be improved at a later date.

@masahi
Copy link
Member

masahi commented Jul 1, 2020

Do we want to wait until dnnl is up on the CI? And what about @zhiics's comment below.

And we should remove the json_runtime_example

@zhiics
Copy link
Member

zhiics commented Jul 1, 2020

@masahi Thanks, my comment should be resolved with a followup PR.

@comaniac
Copy link
Contributor Author

comaniac commented Jul 1, 2020

We could wait for CI. It should have been updated and included DNNL library already (#5936 ).

@comaniac
Copy link
Contributor Author

comaniac commented Jul 2, 2020

Wait for #5985.

@comaniac comaniac force-pushed the json_runtime branch 2 times, most recently from c0f7d43 to b012904 Compare July 10, 2020 06:36
@masahi masahi merged commit 8a0249c into apache:master Jul 10, 2020
@masahi
Copy link
Member

masahi commented Jul 10, 2020

Thanks @comaniac @zhiics @mbaret @lhutton1

@zhiics zhiics deleted the json_runtime branch July 10, 2020 15:48
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jul 14, 2020
* json runtime

* json dnnl WIP

* fix ArrayNode usages

* Support composite functions

* DNNL json runtime: conv2d/add/relu/dense/bn

* add a more complex example

* fix bias memory issue

* rebase to upstream

* merge to metadata module, remove the unused driver

* handle constant

* support composite functions

* support DNNL constant

* clean up

* Simplify dnnl user code

* GetDataSize

* fix dense bug

* improve cmake

* zero copy

* add unit test

* move json to contrib/json

* fix cmake

* lint

* max_digits10 for fp serialization

* only keep base getfunction

* fix lint

* zero copy for all data entries

* address comments

* enable ci

* address comment; fix bug

* address comment

Co-authored-by: Zhi Chen <chzhi@amazon.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jul 14, 2020
* json runtime

* json dnnl WIP

* fix ArrayNode usages

* Support composite functions

* DNNL json runtime: conv2d/add/relu/dense/bn

* add a more complex example

* fix bias memory issue

* rebase to upstream

* merge to metadata module, remove the unused driver

* handle constant

* support composite functions

* support DNNL constant

* clean up

* Simplify dnnl user code

* GetDataSize

* fix dense bug

* improve cmake

* zero copy

* add unit test

* move json to contrib/json

* fix cmake

* lint

* max_digits10 for fp serialization

* only keep base getfunction

* fix lint

* zero copy for all data entries

* address comments

* enable ci

* address comment; fix bug

* address comment

Co-authored-by: Zhi Chen <chzhi@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants