forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0432407
commit 9ccc3f1
Showing
9 changed files
with
173 additions
and
253 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,18 @@ | ||
# NNVM: Build deep learning system by parts | ||
# NNVM: Graph IR Stack for Deep Learning Systems | ||
|
||
[![Build Status](https://travis-ci.org/dmlc/nnvm.svg?branch=master)](https://travis-ci.org/dmlc/nnvm) | ||
[![GitHub license](http://dmlc.github.io/img/apache2.svg)](./LICENSE) | ||
|
||
NNVM is not a deep learning library. It is a modular, | ||
decentralized and lightweight part to help build deep learning libraries. | ||
NNVM is a reusable computational graph optimization and compilation stack for deep learning systems. | ||
NNVM provides modules to: | ||
|
||
## What is it | ||
- Represent deep learning workloads from front-end frameworks via a graph IR. | ||
- Optimize computation graphs to improve performance. | ||
- Compile into executable modules and deploy to different hardware backends with minimum dependency. | ||
|
||
While most deep learning systems offer end to end solutions, | ||
it is interesting to assemble a deep learning system by parts. | ||
The goal is to enable user to customize optimizations, target platforms and set of operators they care about. | ||
We believe that the decentralized modular system is an interesting direction. | ||
|
||
The hope is that effective parts can be assembled together just like you assemble your own desktops. | ||
So the customized deep learning solution can be minimax, minimum in terms of dependencies, | ||
while maximizing the users' need. | ||
|
||
NNVM offers one such part, it provides a generic way to do | ||
computation graph optimization such as memory reduction, device allocation and more | ||
while being agnostic to the operator interface definition and how operators are executed. | ||
NNVM is inspired by LLVM, aiming to be a high level intermediate representation library | ||
for neural nets and computation graphs generation and optimizations. | ||
|
||
See [Overview](docs/overview.md) for an introduction on what it provides. | ||
|
||
## Example | ||
See [TinyFlow](https://github.com/tqchen/tinyflow) on how you can build a TensorFlow API with NNVM and Torch. | ||
|
||
## Why build learning system by parts | ||
|
||
This is essentially ***Unix philosophy*** applied to machine learning system. | ||
|
||
- Essential parts can be assembled in minimum way for embedding systems. | ||
- Developers can hack the parts they need and compose with other well defined parts. | ||
- Decentralized modules enable new extensions creators to own their project | ||
without creating a monolithic version. | ||
|
||
Deep learning system itself is not necessary one part, for example | ||
here are some relative independent parts that can be isolated | ||
|
||
- Computation graph definition, manipulation. | ||
- Computation graph intermediate optimization. | ||
- Computation graph execution. | ||
- Operator kernel libraries. | ||
- Imperative task scheduling and parallel task coordination. | ||
|
||
We hope that there will be more modular parts in the future, | ||
so system building can be fun and rewarding. | ||
NNVM is designed to add new frontend, operators and graph optimizations in a decentralized fashion without changing the core interface. NNVM is part of [TVM stack](https://github.com/dmlc/tvm), which provides an end to end IR compilation stack for deploying deep learning workloads into different hardware backends | ||
|
||
## Links | ||
- [TinyFlow](https://github.com/tqchen/tinyflow) on how you can use NNVM to build a TensorFlow like API. | ||
- [Apache MXNet](http://mxnet.io/) uses NNVM as a backend. | ||
|
||
[MXNet](https://github.com/dmlc/mxnet) is moving to NNVM as its intermediate | ||
representation layer for symbolic graphs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
NNVM Design Note | ||
================ | ||
|
||
In this part of documentation, we share the rationale for the specific choices made when designing NNVM. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
overview |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
|
||
# NNVM Design Overview | ||
|
||
NNVM is a reusable graph IR stack for deep learning systems. It provides useful API to construct, represent and transform computation graphs to get most high-level optimization needed in deep learning. | ||
As a part of TVM stack for deep learning, NNVM also provides a shared compiler for deep learning frameworks to optimize, compile and deploy into different hardware backends via [TVM](https://github.com/dmlc/tvm) | ||
|
||
## Key Requirements and Design Choices | ||
|
||
- Have minimum dependency in the deployment module. | ||
- Being able to add new operators to the IR, in a decentralized fashion. | ||
- Being able to add new optimization passes to the IR and applies to existing graphs. | ||
|
||
The item2 and 3 are particularly interesting if we compare it to a typical compiler IR. Compiler IR usually contains a fixed set of primitives(instructions), and use them as a contract between optimization pass designers. This design enables easy addition of new optimization passes, but not new operator(instruction). Because every time we add a new instruction, we need to modify the passes to accommodate these changes. | ||
|
||
Deep learning frameworks usually have a fixed operator interface(schema). These interfaces can contain properties like shape inference function, whether in-place computation can happen. The operator interface is an again contract that makes it easy to add new an operator. But it is hard to add new passes in decentralized fashion a new optimization pass usually requires additional information, and this results in frequent changes of the centralized operator interface when we are exploring new optimizations. There is also a drawback of modularization. For example, a graph compiler for FPGA devices may not need the GPU device specific attributes. | ||
|
||
During our explorations in graph optimization and compilation, we find that it is important to quickly add both operators and passes to the framework without changing the core library. | ||
|
||
Here is a list of key elements in NNVM's design | ||
|
||
- Operator registry system to register and add new operators | ||
- Operator attribute system provide property of operator in decentralized fashion | ||
- A reusable IR data structure for optimization passes. | ||
|
||
The above list is more like the generic language part of NNVM, besides of that, we also provide a collection of core operator primitives, and graph optimization passes. The core tensor operator primitives and optimizations already cover commonly deep learning workloads. This design allows the NNVM compiler to be directly used as optimization and compilation stack for frameworks. The extendible nature of NNVM makes new adjustment easy without constraining the backend providers. | ||
|
||
## Minimum Registration for a Symbolic Front-End | ||
To use NNVM to build language front end, a developer only needs to register minimum information about each operator. | ||
|
||
```c++ | ||
NNVM_REGISTER_OP(add) | ||
.describe("add two data together") | ||
.set_num_inputs(2); | ||
|
||
NNVM_REGISTER_OP(conv2d) | ||
.describe("take 2d convolution of input") | ||
.set_num_inputs(2); | ||
|
||
NNVM_REGISTER_OP(assign) | ||
.describe("assign second input argument to the first one") | ||
.set_num_inputs(2); | ||
``` | ||
Compiling the code with NNVM library. User can use the following interface to compose the computation graph in python, like the following code. | ||
```python | ||
import nnvm.symbol as nn | ||
# symbolic variable | ||
x = nn.Variable('x') | ||
y = nn.Variable('y') | ||
w = nn.Variable('w') | ||
z = nn.conv2d(nn.elemwise_add(x, y), w, kernel_size=(2,2), name='conv1') | ||
``` | ||
|
||
The graph structure is interchangeable between the frontend and the backend. Python interface is supported currently. More language support can be easily | ||
moved in the future. | ||
|
||
## Operator Attribute for More Extensions | ||
|
||
The minimum information provided by the operator is enough to get a front-end. However, we need more knowledge about each operator to do transformations and executing the graph. | ||
A typical difference between neural nets' computation graph and traditional compiler IR is that there are a lot more high-level operators. We cannot fix the set of operators in the IR. | ||
|
||
NNVM allow developers to register attributes of each operator. The attributes can include shape inference function, whether the operator can perform in-place calculation etc. | ||
|
||
This design to having an operator attribute registry is not uncommon in deep learning systems. | ||
For example, MXNet has a ```OpProperty``` class, Tensorflow has a ```OpDef``` and Caffe2 have a ```OperatorSchema``` class. | ||
However, the operator attribute interface listed in these frameworks only support a fixed number of defined attributes of interest to the system. If we want to extend the framework to add a new attribute in each operator, we need to change the operator registry. | ||
Eventually, the operator interface grows into to be very big and have to evolve in the centralized repo. | ||
|
||
In NNVM, we decided to change the design and support arbitrary type of operator attributes, without changing the interface registry. The minimum interface also makes it easier to share across multiple projects | ||
|
||
User can register new attribute, such as inplace property checking function as follows. | ||
```c++ | ||
using FInplaceOption = std::function< | ||
std::vector<std::pair<int, int> > (const NodeAttrs& attrs)>; | ||
|
||
// we can register attributes from multiple places. | ||
NNVM_REGISTER_OP(elemwise_add) | ||
.set_num_inputs(2); | ||
|
||
// register to tell first input can be calculate inplace with first output | ||
NNVM_REGISTER_OP(add) | ||
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) { | ||
return std::vector<std::pair<int, int> >{{0, 0}}; | ||
}); | ||
|
||
NNVM_REGISTER_OP(exp) | ||
.set_num_inputs(1) | ||
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) { | ||
return std::vector<std::pair<int, int> >{{0, 0}}; | ||
}); | ||
``` | ||
We can query these attributes at arbitrary parts of the code, like the following parts. Under the hood, each attribute is stored in a columnar store, that can easily be retrieved table and do quick lookups. | ||
```c++ | ||
void MyFunction() { | ||
const Op* add = Op::Get("add"); | ||
// if we need quick query, we can use static variable | ||
// attribute map contains attributes of all operators. | ||
static auto& finplace_option_map = Op::GetAttr<FInplaceOption>("FInplaceOption"); | ||
// quick look up attribute of add, O(1) time, vector index lookup internally. | ||
auto add_inplace = finplace_option_tbl[add]; | ||
} | ||
``` | ||
Besides making the code minimum, this attribute store enables decentralization of projects. | ||
Before, all the attributes of operator have to sit on a centralized interface class. | ||
Now, everyone can register attributes of their own, take some other attributes they need from another project without changing the operator interface and core library | ||
|
||
|
||
## Graph and Pass | ||
|
||
We can use the additional information on attribute registry to do optimizations and get more information about the graph. Graph is the unit we manipulate in these steps. A Graph in NNVM contains | ||
two parts: | ||
- The computation graph structure | ||
- A attribute map from string to any type ```map<string, shared_ptr<any> >``` | ||
|
||
The second attribute map is quite important, as we may need different kinds | ||
of information about the graph during the transformation process. Let it be | ||
shapes of each tensor, types of each tensor or the storage allocation plans. | ||
|
||
A ```Pass``` can take a graph with existing attribute information, | ||
and transform it to the same graph structure with more graph attributes or another graph. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.