-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Introduction to module serialization
- Loading branch information
1 parent
06ce76b
commit 2cbc001
Showing
2 changed files
with
228 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,227 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
Introduction to Module Serialization | ||
==================================== | ||
|
||
When to deploy TVM runtime module, no matter whether it is CPU or GPU, TVM only needs one single DLL. | ||
The key is our unified module serialization mechanism. This document will introduce TVM module | ||
serialization format standard and implementation details. | ||
|
||
********************* | ||
Module Export Example | ||
********************* | ||
|
||
Firstly, Let us build one ResNet-18 workload for GPU as our example firstly. | ||
|
||
.. code:: python | ||
from tvm import relay | ||
from tvm.relay import testing | ||
from tvm.contrib import util | ||
import tvm | ||
# Resnet18 workload | ||
resnet18_mod, resnet18_params = relay.testing.resnet.get_workload(num_layers=18) | ||
# build | ||
with relay.build_config(opt_level=3): | ||
_, resnet18_lib, _ = relay.build_module.build(resnet18_mod, "cuda", params=resnet18_params) | ||
# create one tempory directory | ||
temp = util.tempdir() | ||
# path lib | ||
file_name = "deploy.so" | ||
path_lib = temp.relpath(file_name) | ||
# export library | ||
resnet18_lib.export_library(path_lib) | ||
# load it back | ||
loaded_lib = tvm.module.load(path_lib) | ||
assert loaded_lib.type_key == "library" | ||
assert loaded_lib.imported_modules[0].type_key == "cuda" | ||
************* | ||
Serialization | ||
************* | ||
|
||
The entrance API is ``export_library`` of ``tvm.module.Module``. | ||
Inside this function, we will do the following steps: | ||
|
||
1. Collect all DSO modules (LLVM modules and C modules) | ||
|
||
2. Once we have DSO modules, we will call ``save`` function to save them into files. | ||
|
||
3. Next, we will check whether we have imported modules, such as CUDA, | ||
OpenCL or anything else. We don't restrict the module type here. | ||
Once we have imported modules, we will create one file named as ``dev.cc`` | ||
(so that we could embed the binary blob data of import modules into one dynamic shared library), | ||
then call function ``_PackImportsToLLVM`` or ``_PackImportsToC`` to do module serialization. | ||
|
||
4. Finally, we use ``fcompile`` to call ``_cc.create_shared`` to get | ||
dynamic shared library. | ||
|
||
.. note:: | ||
1. For C source modules, we will compile them and link them together with the DSO module. | ||
|
||
2. Use ``_PackImportsToLLVM`` or ``_PackImportsToC`` depends on whether we enable LLVM in TVM. | ||
They achieve the same goal in fact. | ||
|
||
*************************************************** | ||
Under the Hood of Serialization and Format Standard | ||
*************************************************** | ||
|
||
As said before, we will do the serialization work in the ``_PackImportsToLLVM`` or ``_PackImportsToC``. | ||
They will call one same function ``SerializeModule`` to do module serialization. In ``SerializeModule`` | ||
function, we firstly construct one helper class ``ModuleSerializer``. It will take ``module`` to do some | ||
initialization work, like marking module index. Then we could use its ``SerializeModule`` to serialize module. | ||
|
||
For better understanding, let us dig the implementation of this class a little deeper. | ||
|
||
The following code is used to construct ``ModuleSerializer``: | ||
|
||
.. code:: c++ | ||
|
||
explicit ModuleSerializer(runtime::Module mod) : mod_(mod) { | ||
Init(); | ||
} | ||
private: | ||
void Init() { | ||
CreateModuleIndex(); | ||
CreateImportTree(); | ||
} | ||
|
||
In ``CreateModuleIndex()``, We will inspect module import relationship | ||
using DFS and create index for them. Note the root module is fixed at | ||
location 0. In our example, we have module relationship like this: | ||
|
||
.. code:: c++ | ||
|
||
llvm_mod:imported_modules | ||
- cuda_mod | ||
|
||
So LLVM module will have index 0, CUDA module will have index 1. | ||
|
||
After constructing module index, we will try to construct import tree (``CreateImportTree()``), | ||
which will be used to restore module import relationship when we load | ||
the exported library back. In our design, we use CSR format to store | ||
import tree, each row is parent index, the child indices correspond to its children | ||
index. In code, we use ``import_tree_row_ptr_`` and | ||
``import_tree_child_indices_`` to represent them. | ||
|
||
After initialization, we could serialize module using ``SerializeModule`` function. | ||
In its function logic, we will assume the serialization format like this: | ||
|
||
.. code:: c++ | ||
|
||
binary_blob_size | ||
binary_blob_type_key | ||
binary_blob_logic | ||
binary_blob_type_key | ||
binary_blob_logic | ||
... | ||
_import_tree | ||
_import_tree_logic | ||
|
||
``binary_blob_size`` is how many blobs we will have in this | ||
serialization step. In our example, the number will equal to 3. One for | ||
LLVM module, one for CUDA module, one for ``_import_tree``. | ||
|
||
Then we will write the ``binary_blob_type_key``, for LLVM module / C | ||
module, the blob type key is ``_lib``. For CUDA module, it is | ||
``cuda``, which could be got by ``module->type_key()``. | ||
|
||
Next we will do the ``binary_blob_logic``. Normally, we will call | ||
``SaveToBinary`` function to serialize blob into binary. | ||
|
||
.. note:: | ||
When to implement ``SaveToBinary``, it depends on circumstances. | ||
If the module has information we need when we load the dynamic shared library back, | ||
we should do. Like CUDA module, we need its binary data passing to GPU driver when | ||
we load the dynamic shared library, so we should implement ``SaveToBinary`` to serialize | ||
its binary data. But for host module (like DSO), we don't need other information when we | ||
load the dynamic shared library, so we don't need to implement ``SaveToBinary``. However, | ||
if in the future, we want to record some meta information of DSO module, we could implement | ||
``SaveToBinary`` for DSO module too. | ||
|
||
Finally, we will write one key ``_import_tree``. Unless our module only | ||
has one DSO module and it is in the root, otherwise we will always | ||
have this key. It is used to reconstruct the module import relationship | ||
when we load the exported library back as said before. The | ||
``import_tree_logic`` is just to write ``import_tree_row_ptr_`` and | ||
``import_tree_child_indices_`` into stream. | ||
|
||
After this step, we will pack it into a symbol | ||
``runtime::symbol::tvm_dev_mblob`` that can be recovered in the dynamic | ||
libary. | ||
|
||
Now, we complete the serialization part. As you have seen, we could | ||
support arbitrary modules to import ideally. | ||
|
||
**************** | ||
Deserialization | ||
**************** | ||
|
||
The entrance API is ``tvm.module.load``. This function | ||
is to call ``_LoadFromFile`` in fact. If we dig it a little deeper, this is | ||
``Module::LoadFromFile``. In our example, the file ``deploy.so``, | ||
according to the function logic, we will call ``module.loadfile_so`` in | ||
``dso_library.cc``. The key is here: | ||
|
||
.. code:: c++ | ||
|
||
// Load the imported modules | ||
const char* dev_mblob = reinterpret_cast<const char*>(lib->GetSymbol(runtime::symbol::tvm_dev_mblob)); | ||
Module root_mod; | ||
if (dev_mblob != nullptr) { | ||
root_mod = ProcessModuleBlob(dev_mblob, lib); | ||
} else { | ||
// Only have one single DSO Module | ||
root_mod = Module(n); | ||
} | ||
|
||
As said before, we will pack the blob into the symbol | ||
``runtime::symbol::tvm_dev_mblob``. During deserialization part, we will | ||
inspect it. If we have ``runtime::symbol::tvm_dev_mblob``, we will call ``ProcessModuleBlob``, | ||
whose logic like this: | ||
|
||
.. code:: c++ | ||
|
||
READ(blob_size) | ||
READ(blob_type_key) | ||
for (size_t i = 0; i < blob_size; i++) { | ||
if (blob_type_key == "_lib") { | ||
// construct dso module using lib | ||
} else if (blob_type_key == "_import_tree") { | ||
// READ(_import_tree_row_ptr) | ||
// READ(_import_tree_child_indices) | ||
} else { | ||
// call module.loadbinary_blob_type_key, such as module.loadbinary_cuda | ||
// to restore. | ||
} | ||
} | ||
// Using _import_tree_row_ptr and _import_tree_child_indices to | ||
// restore module import relationship. The first module is the | ||
// root module according to our invariance as said before. | ||
return root_module; | ||
|
||
After this, we will set the ``ctx_address`` to be the ``root_module`` so | ||
that allow lookup of symbol from root (so all symbols are visible). | ||
|
||
Finally, we complete the deserialization part. |