[Doc] Nano benchmark and tutorial (#71)

* add cls benchmark * add nano zh-cn benchmark and en tutorial * add device row * add doc path to index.rst * fix typo
open-mmlab · Jan 24, 2022 · eeddd8a · eeddd8a
1 parent c74c41b
commit eeddd8a
Show file tree

Hide file tree

Showing 5 changed files with 288 additions and 62 deletions.
diff --git a/docs/en/backends/tensorrt.md b/docs/en/backends/tensorrt.md
@@ -131,3 +131,7 @@ If the calibration dataset is not given, the data will be calibrated with the da
   TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don't want to upgrade.
 
   Read [this](https://forums.developer.nvidia.com/t/matrixmultiply-failed-on-tensorrt-7-2-1/158187/4) for detail.
+
+- Install mmdeploy on Jetsons
+
+  We provide a tutorial to get start on Jetsons [here](../tutorials/how_to_install_mmdeploy_on_jetsons.md).
diff --git a/docs/en/benchmark.md b/docs/en/benchmark.md
@@ -32,24 +32,33 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 <thead>
   <tr>
     <th align="center" colspan="3">MMCls</th>
-    <th align="center" colspan="6">TensorRT</th>
+    <th align="center" colspan="10">TensorRT</th>
     <th align="center" colspan="2">PPLNN</th>
     <th align="center" colspan="4">NCNN</th>
-    <th align="center"></th>
+    <th></th>
   </tr>
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Dataset</td>
-    <td align="center" rowspan="2">Input</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Dataset</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="6">T4</td>
+    <td align="center" colspan="4">JetsonNano2GB</td>
+    <td align="center" colspan="2">T4</td>
+    <td align="center" colspan="2">SnapDragon888</td>
+    <td align="center" colspan="2">Adreno660</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
     <td align="center" colspan="2">fp32</td>
     <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">int8</td>
+    <td align="center" colspan="2">fp32</td>
+    <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">fp16</td>
-    <td align="center" colspan="2">SnapDragon888-fp32</td>
-    <td align="center" colspan="2">Adreno660-fp32</td>
-    <td rowspan="2">model config file</td>
+    <td align="center" colspan="2">fp32</td>
+    <td align="center" colspan="2">fp32</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>
@@ -64,6 +73,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
     <td align="center">FPS</td>
     <td align="center">latency (ms)</td>
     <td align="center">FPS</td>
+    <td align="center">latency (ms)</td>
+    <td align="center">FPS</td>
+    <td align="center">latency (ms)</td>
+    <td align="center">FPS</td>
   </tr>
   <tr>
     <td align="center">ResNet</td>
@@ -75,6 +88,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
     <td align="center">791.89</td>
     <td align="center">1.21</td>
     <td align="center">829.66</td>
+    <td align="center">59.32</td>
+    <td align="center">16.86</td>
+    <td align="center">30.54</td>
+    <td align="center">32.75</td>
     <td align="center">1.30</td>
     <td align="center">768.28</td>
     <td align="center">33.91</td>
@@ -93,6 +110,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
     <td align="center">703.42</td>
     <td align="center">1.37</td>
     <td align="center">727.42</td>
+    <td align="center">88.10</td>
+    <td align="center">11.35</td>
+    <td align="center">49.18</td>
+    <td align="center">20.13</td>
     <td align="center">1.36</td>
     <td align="center">737.67</td>
     <td align="center">133.44</td>
@@ -111,6 +132,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
     <td align="center">600.73</td>
     <td align="center">1.51</td>
     <td align="center">662.90</td>
+    <td align="center">74.59</td>
+    <td align="center">13.41</td>
+    <td align="center">48.78</td>
+    <td align="center">20.50</td>
     <td align="center">1.91</td>
     <td align="center">524.07</td>
     <td align="center">107.84</td>
@@ -129,6 +154,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
     <td align="center">841.36</td>
     <td align="center">1.13</td>
     <td align="center">883.47</td>
+    <td align="center">15.26</td>
+    <td align="center">65.54</td>
+    <td align="center">10.23</td>
+    <td align="center">97.77</td>
     <td align="center">4.69</td>
     <td align="center">213.33</td>
     <td align="center">9.55</td>
@@ -157,14 +186,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Dataset</td>
-    <td align="center" rowspan="2">Input</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Dataset</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="6">T4</td>
+    <td align="center" colspan="2">T4</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
     <td align="center" colspan="2">fp32</td>
     <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">int8</td>
     <td align="center" colspan="2">fp16</td>
-    <td rowspan="2">model config file</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>
@@ -286,12 +319,16 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Dataset</td>
-    <td align="center" rowspan="2">Input</td>
-    <td align="center" colspan="2">SnapDragon888-fp32</td>
-    <td align="center" colspan="2">Adreno660-fp32</td>
-    <td rowspan="2">model config file</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Dataset</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="2">SnapDragon888</td>
+    <td align="center" colspan="2">Adreno660</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
+    <td align="center" colspan="2">fp32</td>
+    <td align="center" colspan="2">fp32</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>
@@ -348,13 +385,17 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Input</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="6">T4</td>
+    <td align="center" colspan="2">T4</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
     <td align="center" colspan="2">fp32</td>
     <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">int8</td>
     <td align="center" colspan="2">fp16</td>
-    <td rowspan="2">model config file</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>
@@ -412,16 +453,22 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Dataset</td>
-    <td align="center" rowspan="2">Input</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Dataset</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="6">T4</td>
+    <td align="center" colspan="2">T4</td>
+    <td align="center" colspan="2">SnapDragon888</td>
+    <td align="center" colspan="2">Adreno660</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
     <td align="center" colspan="2">fp32</td>
     <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">int8</td>
     <td align="center" colspan="2">fp16</td>
-    <td align="center" colspan="2">SnapDragon888-fp32</td>
-    <td align="center" colspan="2">Adreno660-fp32</td>
-    <td rowspan="2">model config file</td>
+    <td align="center" colspan="2">fp32</td>
+    <td align="center" colspan="2">fp32</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>
@@ -491,14 +538,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
 </thead>
 <tbody>
   <tr>
-    <td align="center" rowspan="2">Model</td>
-    <td align="center" rowspan="2">Dataset</td>
-    <td align="center" rowspan="2">Input</td>
+    <td align="center" rowspan="3">Model</td>
+    <td align="center" rowspan="3">Dataset</td>
+    <td align="center" rowspan="3">Input</td>
+    <td align="center" colspan="6">T4</td>
+    <td align="center" colspan="2">T4</td>
+    <td rowspan="3">model config file</td>
+  </tr>
+  <tr>
     <td align="center" colspan="2">fp32</td>
     <td align="center" colspan="2">fp16</td>
     <td align="center" colspan="2">int8</td>
     <td align="center" colspan="2">fp16</td>
-    <td rowspan="2">model config file</td>
   </tr>
   <tr>
     <td align="center">latency (ms)</td>

diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -23,6 +23,7 @@ You can switch between Chinese and English documents in the lower-left corner of
    tutorials/how_to_support_new_backends.md
    tutorials/how_to_add_test_units_for_backend_ops.md
    tutorials/how_to_test_rewritten_models.md
+   tutorials/how_to_install_mmdeploy_on_jetsons.md
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md b/docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md
@@ -0,0 +1,119 @@
+## How to install mmdeploy on Jetsons
+
+This tutorial introduces how to install mmdeploy on Nvidia Jetson systems. It mainly introduces the installation of mmdeploy on three Jetson series boards:
+- Jetson Nano
+- Jetson AGX Xavier
+- Jetson TX2
+
+For Jetson Nano, we use Jetson Nano 2GB and install [JetPack SDK](https://developer.nvidia.com/embedded/jetpack) through SD card image method.
+
+### Install JetPack SDK
+
+There are mainly two ways to install the JetPack:
+1. Write the image to the SD card directly.
+2. Use the SDK Manager to do this.
+
+The first method does not need two separated machines and their display equipment or cables. We just follow the instruction to write the image. This is pretty convenient. Click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-2gb-devkit#intro) for Jetson Nano 2GB to start. And click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit) for Jetson Nano 4GB to start the journey.
+
+The second method, however, requires we set up another display tool and cable to the jetson hardware. This method is safer than the previous one as the first method may sometimes cannot write the image in and throws a warning during validation. Click [here](https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html) to start.
+
+For the first method, if it always throws `Attention something went wrong...` even the file already get re-downloaded, just try `wget` to download the file and change the tail name instead.
+
+### Launch the system
+
+Sometimes we just need to reboot the jetson device when it gets stuck in initializing the system.
+
+### Cuda
+
+The Cuda is installed by default while the cudnn is not if we use the first method. We have to write the cuda path and lib to `$PATH` and `$LD_LIBRARY_PATH`:
+```
+export PATH=$PATH:/usr/local/cuda/bin
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
+```
+Then we can use `nvcc -V` the get the version of cuda we use.
+
+### Anaconda
+
+We have to install [Archiconda](https://github.com/Archiconda/build-tools/releases) instead as the Anaconda does not provide the wheel built for jetson.
+
+After we installed the Archiconda successfully and created the virtual env correctly. If the pip in the env does not work properly or throw `Illegal instruction (core dumped)`, we may consider re-install the pip manually, reinstalling the whole JetPack SDK is the last method we can try.
+
+### Move tensorrt to conda env
+After we installed the Archiconda, we can use it to create a virtual env like `mmdeploy`. Then we have to move the pre-installed tensorrt package in Jetpack to the virtual env.
+
+First we use `find` to get where the tensorrt is
+```
+sudo find / -name tensorrt
+```
+Then copy the tensorrt to our destination like:
+```
+cp -r /usr/lib/python3.6/dist-packages/tensorrt* /home/archiconda3/env/mmdeploy/lib/python3.6/site-packages/
+```
+Meanwhle, tensorrt libs like `libnvinfer.so` can be found in `LD_LIBRARY_PATH`, which is done by Jetpack as well.
+
+### Install torch
+
+Install the PyTorch for Jetsons **specifically**. Click [here](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048) to get the wheel. Before we use `pip install`, we have to install `libopenblas-base`, `libopenmpi-dev` first:
+```
+sudo apt-get install libopenblas-base libopenmpi-dev
+```
+Or, it will throw the following error when we import torch in python:
+```
+libmpi_cxx.so.20: cannot open shared object file: No such file or directory
+```
+
+### Install torchvision
+We can't directly use `pip install torchvision` to install torchvision for Jetson Nano. But we can clone the repository from Github and build it locally. First we have to install some dependencies:
+```
+sudo apt-get install libjpeg-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev
+```
+Then just clone and compile the project:
+```
+git clone git@github.com:pytorch/vision.git
+cd vision
+git co tags/v0.7.0 -b vision07
+pip install -e .
+```
+
+### Install mmcv
+
+Install openssl first:
+```
+sudo apt-get install libssl-dev
+```
+Then install it from source like `MMCV_WITH_OPS=1 pip install -e .`
+
+### Update cmake
+
+We choose cmake version 20 as an example.
+```
+sudo apt-get install -y libssl-dev
+wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gz
+tar -zxvf cmake-3.20.0.tar.gz
+cd cmake-3.20.0
+./bootstrap
+make
+sudo make install
+```
+Then we can check the cmake version through:
+```
+source ~/.bashrc
+cmake --version
+```
+
+### Install mmdeploy
+Just follow the instruction [here](../build.md). If it throws `failed building wheel for numpy...ERROR: Failed to build one or more wheels` when installing `h5py`, try install `h5py` manually.
+```
+sudo apt-get install pkd-config libhdf5-100 libhdf5-dev
+pip install versioned-hdf5 --no-cache-dir
+```
+
+Then install onnx manually. First, we have to install protobuf compiler:
+```
+sudo apt-get install libprotobuf-dev protobuf-compiler
+```
+Then install onnx through:
+```
+pip install onnx
+```
+Then reinstall mmdeploy.