Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Nano benchmark and tutorial #71

Merged
merged 5 commits into from
Jan 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/en/backends/tensorrt.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,7 @@ If the calibration dataset is not given, the data will be calibrated with the da
TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don't want to upgrade.

Read [this](https://forums.developer.nvidia.com/t/matrixmultiply-failed-on-tensorrt-7-2-1/158187/4) for detail.

- Install mmdeploy on Jetsons

We provide a tutorial to get start on Jetsons [here](../tutorials/how_to_install_mmdeploy_on_jetsons.md).
113 changes: 82 additions & 31 deletions docs/en/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,24 +32,33 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<thead>
<tr>
<th align="center" colspan="3">MMCls</th>
<th align="center" colspan="6">TensorRT</th>
<th align="center" colspan="10">TensorRT</th>
<th align="center" colspan="2">PPLNN</th>
<th align="center" colspan="4">NCNN</th>
<th align="center"></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Dataset</td>
<td align="center" rowspan="2">Input</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Dataset</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="6">T4</td>
<td align="center" colspan="4">JetsonNano2GB</td>
<td align="center" colspan="2">T4</td>
<td align="center" colspan="2">SnapDragon888</td>
<td align="center" colspan="2">Adreno660</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">int8</td>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">SnapDragon888-fp32</td>
<td align="center" colspan="2">Adreno660-fp32</td>
<td rowspan="2">model config file</td>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp32</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand All @@ -64,6 +73,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<td align="center">FPS</td>
<td align="center">latency (ms)</td>
<td align="center">FPS</td>
<td align="center">latency (ms)</td>
<td align="center">FPS</td>
<td align="center">latency (ms)</td>
<td align="center">FPS</td>
</tr>
<tr>
<td align="center">ResNet</td>
Expand All @@ -75,6 +88,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<td align="center">791.89</td>
<td align="center">1.21</td>
<td align="center">829.66</td>
<td align="center">59.32</td>
<td align="center">16.86</td>
<td align="center">30.54</td>
<td align="center">32.75</td>
<td align="center">1.30</td>
<td align="center">768.28</td>
<td align="center">33.91</td>
Expand All @@ -93,6 +110,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<td align="center">703.42</td>
<td align="center">1.37</td>
<td align="center">727.42</td>
<td align="center">88.10</td>
<td align="center">11.35</td>
<td align="center">49.18</td>
<td align="center">20.13</td>
<td align="center">1.36</td>
<td align="center">737.67</td>
<td align="center">133.44</td>
Expand All @@ -111,6 +132,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<td align="center">600.73</td>
<td align="center">1.51</td>
<td align="center">662.90</td>
<td align="center">74.59</td>
<td align="center">13.41</td>
<td align="center">48.78</td>
<td align="center">20.50</td>
<td align="center">1.91</td>
<td align="center">524.07</td>
<td align="center">107.84</td>
Expand All @@ -129,6 +154,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
<td align="center">841.36</td>
<td align="center">1.13</td>
<td align="center">883.47</td>
<td align="center">15.26</td>
<td align="center">65.54</td>
<td align="center">10.23</td>
<td align="center">97.77</td>
<td align="center">4.69</td>
<td align="center">213.33</td>
<td align="center">9.55</td>
Expand Down Expand Up @@ -157,14 +186,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Dataset</td>
<td align="center" rowspan="2">Input</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Dataset</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="6">T4</td>
<td align="center" colspan="2">T4</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">int8</td>
<td align="center" colspan="2">fp16</td>
<td rowspan="2">model config file</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand Down Expand Up @@ -286,12 +319,16 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Dataset</td>
<td align="center" rowspan="2">Input</td>
<td align="center" colspan="2">SnapDragon888-fp32</td>
<td align="center" colspan="2">Adreno660-fp32</td>
<td rowspan="2">model config file</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Dataset</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="2">SnapDragon888</td>
<td align="center" colspan="2">Adreno660</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp32</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand Down Expand Up @@ -338,13 +375,17 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Input</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="6">T4</td>
<td align="center" colspan="2">T4</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">int8</td>
<td align="center" colspan="2">fp16</td>
<td rowspan="2">model config file</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand Down Expand Up @@ -402,16 +443,22 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Dataset</td>
<td align="center" rowspan="2">Input</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Dataset</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="6">T4</td>
<td align="center" colspan="2">T4</td>
<td align="center" colspan="2">SnapDragon888</td>
<td align="center" colspan="2">Adreno660</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">int8</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">SnapDragon888-fp32</td>
<td align="center" colspan="2">Adreno660-fp32</td>
<td rowspan="2">model config file</td>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp32</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand Down Expand Up @@ -481,14 +528,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models.
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">Model</td>
<td align="center" rowspan="2">Dataset</td>
<td align="center" rowspan="2">Input</td>
<td align="center" rowspan="3">Model</td>
<td align="center" rowspan="3">Dataset</td>
<td align="center" rowspan="3">Input</td>
<td align="center" colspan="6">T4</td>
<td align="center" colspan="2">T4</td>
<td rowspan="3">model config file</td>
</tr>
<tr>
<td align="center" colspan="2">fp32</td>
<td align="center" colspan="2">fp16</td>
<td align="center" colspan="2">int8</td>
<td align="center" colspan="2">fp16</td>
<td rowspan="2">model config file</td>
</tr>
<tr>
<td align="center">latency (ms)</td>
Expand Down
1 change: 1 addition & 0 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ You can switch between Chinese and English documents in the lower-left corner of
tutorials/how_to_support_new_backends.md
tutorials/how_to_add_test_units_for_backend_ops.md
tutorials/how_to_test_rewritten_models.md
tutorials/how_to_install_mmdeploy_on_jetsons.md

.. toctree::
:maxdepth: 1
Expand Down
119 changes: 119 additions & 0 deletions docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
## How to install mmdeploy on Jetsons

This tutorial introduces how to install mmdeploy on Nvidia Jetson systems. It mainly introduces the installation of mmdeploy on three Jetson series boards:
- Jetson Nano
- Jetson AGX Xavier
- Jetson TX2

For Jetson Nano, we use Jetson Nano 2GB and install [JetPack SDK](https://developer.nvidia.com/embedded/jetpack) through SD card image method.

### Install JetPack SDK

There are mainly two ways to install the JetPack:
1. Write the image to the SD card directly.
2. Use the SDK Manager to do this.

The first method does not need two separated machines and their display equipment or cables. We just follow the instruction to write the image. This is pretty convenient. Click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-2gb-devkit#intro) for Jetson Nano 2GB to start. And click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit) for Jetson Nano 4GB to start the journey.

The second method, however, requires we set up another display tool and cable to the jetson hardware. This method is safer than the previous one as the first method may sometimes cannot write the image in and throws a warning during validation. Click [here](https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html) to start.

For the first method, if it always throws `Attention something went wrong...` even the file already get re-downloaded, just try `wget` to download the file and change the tail name instead.

### Launch the system

Sometimes we just need to reboot the jetson device when it gets stuck in initializing the system.

### Cuda

The Cuda is installed by default while the cudnn is not if we use the first method. We have to write the cuda path and lib to `$PATH` and `$LD_LIBRARY_PATH`:
```
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
```
Then we can use `nvcc -V` the get the version of cuda we use.

### Anaconda

We have to install [Archiconda](https://github.com/Archiconda/build-tools/releases) instead as the Anaconda does not provide the wheel built for jetson.

After we installed the Archiconda successfully and created the virtual env correctly. If the pip in the env does not work properly or throw `Illegal instruction (core dumped)`, we may consider re-install the pip manually, reinstalling the whole JetPack SDK is the last method we can try.

### Move tensorrt to conda env
After we installed the Archiconda, we can use it to create a virtual env like `mmdeploy`. Then we have to move the pre-installed tensorrt package in Jetpack to the virtual env.

First we use `find` to get where the tensorrt is
```
sudo find / -name tensorrt
```
Then copy the tensorrt to our destination like:
```
cp -r /usr/lib/python3.6/dist-packages/tensorrt* /home/archiconda3/env/mmdeploy/lib/python3.6/site-packages/
```
Meanwhle, tensorrt libs like `libnvinfer.so` can be found in `LD_LIBRARY_PATH`, which is done by Jetpack as well.

### Install torch

Install the PyTorch for Jetsons **specifically**. Click [here](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048) to get the wheel. Before we use `pip install`, we have to install `libopenblas-base`, `libopenmpi-dev` first:
```
sudo apt-get install libopenblas-base libopenmpi-dev
```
Or, it will throw the following error when we import torch in python:
```
libmpi_cxx.so.20: cannot open shared object file: No such file or directory
```

### Install torchvision
We can't directly use `pip install torchvision` to install torchvision for Jetson Nano. But we can clone the repository from Github and build it locally. First we have to install some dependencies:
```
sudo apt-get install libjpeg-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev
```
Then just clone and compile the project:
```
git clone git@github.com:pytorch/vision.git
cd vision
git co tags/v0.7.0 -b vision07
pip install -e .
```

### Install mmcv

Install openssl first:
```
sudo apt-get install libssl-dev
```
Then install it from source like `MMCV_WITH_OPS=1 pip install -e .`

### Update cmake

We choose cmake version 20 as an example.
```
sudo apt-get install -y libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gz
tar -zxvf cmake-3.20.0.tar.gz
cd cmake-3.20.0
./bootstrap
make
sudo make install
```
Then we can check the cmake version through:
```
source ~/.bashrc
cmake --version
```

### Install mmdeploy
Just follow the instruction [here](../build.md). If it throws `failed building wheel for numpy...ERROR: Failed to build one or more wheels` when installing `h5py`, try install `h5py` manually.
```
sudo apt-get install pkd-config libhdf5-100 libhdf5-dev
pip install versioned-hdf5 --no-cache-dir
```

Then install onnx manually. First, we have to install protobuf compiler:
```
sudo apt-get install libprotobuf-dev protobuf-compiler
```
Then install onnx through:
```
pip install onnx
```
Then reinstall mmdeploy.
Loading