diff --git a/docs/en/backends/tensorrt.md b/docs/en/backends/tensorrt.md index fbe3c754fd..5f218acbf7 100644 --- a/docs/en/backends/tensorrt.md +++ b/docs/en/backends/tensorrt.md @@ -131,3 +131,7 @@ If the calibration dataset is not given, the data will be calibrated with the da TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don't want to upgrade. Read [this](https://forums.developer.nvidia.com/t/matrixmultiply-failed-on-tensorrt-7-2-1/158187/4) for detail. + +- Install mmdeploy on Jetsons + + We provide a tutorial to get start on Jetsons [here](../tutorials/how_to_install_mmdeploy_on_jetsons.md). diff --git a/docs/en/benchmark.md b/docs/en/benchmark.md index e749e8a78d..af724abacf 100644 --- a/docs/en/benchmark.md +++ b/docs/en/benchmark.md @@ -32,24 +32,33 @@ Users can directly test the speed through [how_to_measure_performance_of_models. MMCls - TensorRT + TensorRT PPLNN NCNN - + - Model - Dataset - Input + Model + Dataset + Input + T4 + JetsonNano2GB + T4 + SnapDragon888 + Adreno660 + model config file + + fp32 fp16 int8 + fp32 + fp16 fp16 - SnapDragon888-fp32 - Adreno660-fp32 - model config file + fp32 + fp32 latency (ms) @@ -64,6 +73,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models. FPS latency (ms) FPS + latency (ms) + FPS + latency (ms) + FPS ResNet @@ -75,6 +88,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models. 791.89 1.21 829.66 + 59.32 + 16.86 + 30.54 + 32.75 1.30 768.28 33.91 @@ -93,6 +110,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models. 703.42 1.37 727.42 + 88.10 + 11.35 + 49.18 + 20.13 1.36 737.67 133.44 @@ -111,6 +132,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models. 600.73 1.51 662.90 + 74.59 + 13.41 + 48.78 + 20.50 1.91 524.07 107.84 @@ -129,6 +154,10 @@ Users can directly test the speed through [how_to_measure_performance_of_models. 841.36 1.13 883.47 + 15.26 + 65.54 + 10.23 + 97.77 4.69 213.33 9.55 @@ -157,14 +186,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models. - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms) @@ -286,12 +319,16 @@ Users can directly test the speed through [how_to_measure_performance_of_models. - Model - Dataset - Input - SnapDragon888-fp32 - Adreno660-fp32 - model config file + Model + Dataset + Input + SnapDragon888 + Adreno660 + model config file + + + fp32 + fp32 latency (ms) @@ -338,13 +375,17 @@ Users can directly test the speed through [how_to_measure_performance_of_models. - Model - Input + Model + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms) @@ -402,16 +443,22 @@ Users can directly test the speed through [how_to_measure_performance_of_models. - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + SnapDragon888 + Adreno660 + model config file + + fp32 fp16 int8 fp16 - SnapDragon888-fp32 - Adreno660-fp32 - model config file + fp32 + fp32 latency (ms) @@ -481,14 +528,18 @@ Users can directly test the speed through [how_to_measure_performance_of_models. - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms) diff --git a/docs/en/index.rst b/docs/en/index.rst index 89b73a7676..da96013832 100644 --- a/docs/en/index.rst +++ b/docs/en/index.rst @@ -23,6 +23,7 @@ You can switch between Chinese and English documents in the lower-left corner of tutorials/how_to_support_new_backends.md tutorials/how_to_add_test_units_for_backend_ops.md tutorials/how_to_test_rewritten_models.md + tutorials/how_to_install_mmdeploy_on_jetsons.md .. toctree:: :maxdepth: 1 diff --git a/docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md b/docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md new file mode 100644 index 0000000000..484d1ce1c2 --- /dev/null +++ b/docs/en/tutorials/how_to_install_mmdeploy_on_jetsons.md @@ -0,0 +1,119 @@ +## How to install mmdeploy on Jetsons + +This tutorial introduces how to install mmdeploy on Nvidia Jetson systems. It mainly introduces the installation of mmdeploy on three Jetson series boards: +- Jetson Nano +- Jetson AGX Xavier +- Jetson TX2 + +For Jetson Nano, we use Jetson Nano 2GB and install [JetPack SDK](https://developer.nvidia.com/embedded/jetpack) through SD card image method. + +### Install JetPack SDK + +There are mainly two ways to install the JetPack: +1. Write the image to the SD card directly. +2. Use the SDK Manager to do this. + +The first method does not need two separated machines and their display equipment or cables. We just follow the instruction to write the image. This is pretty convenient. Click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-2gb-devkit#intro) for Jetson Nano 2GB to start. And click [here](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit) for Jetson Nano 4GB to start the journey. + +The second method, however, requires we set up another display tool and cable to the jetson hardware. This method is safer than the previous one as the first method may sometimes cannot write the image in and throws a warning during validation. Click [here](https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html) to start. + +For the first method, if it always throws `Attention something went wrong...` even the file already get re-downloaded, just try `wget` to download the file and change the tail name instead. + +### Launch the system + +Sometimes we just need to reboot the jetson device when it gets stuck in initializing the system. + +### Cuda + +The Cuda is installed by default while the cudnn is not if we use the first method. We have to write the cuda path and lib to `$PATH` and `$LD_LIBRARY_PATH`: +``` +export PATH=$PATH:/usr/local/cuda/bin +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 +``` +Then we can use `nvcc -V` the get the version of cuda we use. + +### Anaconda + +We have to install [Archiconda](https://github.com/Archiconda/build-tools/releases) instead as the Anaconda does not provide the wheel built for jetson. + +After we installed the Archiconda successfully and created the virtual env correctly. If the pip in the env does not work properly or throw `Illegal instruction (core dumped)`, we may consider re-install the pip manually, reinstalling the whole JetPack SDK is the last method we can try. + +### Move tensorrt to conda env +After we installed the Archiconda, we can use it to create a virtual env like `mmdeploy`. Then we have to move the pre-installed tensorrt package in Jetpack to the virtual env. + +First we use `find` to get where the tensorrt is +``` +sudo find / -name tensorrt +``` +Then copy the tensorrt to our destination like: +``` +cp -r /usr/lib/python3.6/dist-packages/tensorrt* /home/archiconda3/env/mmdeploy/lib/python3.6/site-packages/ +``` +Meanwhle, tensorrt libs like `libnvinfer.so` can be found in `LD_LIBRARY_PATH`, which is done by Jetpack as well. + +### Install torch + +Install the PyTorch for Jetsons **specifically**. Click [here](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048) to get the wheel. Before we use `pip install`, we have to install `libopenblas-base`, `libopenmpi-dev` first: +``` +sudo apt-get install libopenblas-base libopenmpi-dev +``` +Or, it will throw the following error when we import torch in python: +``` +libmpi_cxx.so.20: cannot open shared object file: No such file or directory +``` + +### Install torchvision +We can't directly use `pip install torchvision` to install torchvision for Jetson Nano. But we can clone the repository from Github and build it locally. First we have to install some dependencies: +``` +sudo apt-get install libjpeg-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev +``` +Then just clone and compile the project: +``` +git clone git@github.com:pytorch/vision.git +cd vision +git co tags/v0.7.0 -b vision07 +pip install -e . +``` + +### Install mmcv + +Install openssl first: +``` +sudo apt-get install libssl-dev +``` +Then install it from source like `MMCV_WITH_OPS=1 pip install -e .` + +### Update cmake + +We choose cmake version 20 as an example. +``` +sudo apt-get install -y libssl-dev +wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gz +tar -zxvf cmake-3.20.0.tar.gz +cd cmake-3.20.0 +./bootstrap +make +sudo make install +``` +Then we can check the cmake version through: +``` +source ~/.bashrc +cmake --version +``` + +### Install mmdeploy +Just follow the instruction [here](../build.md). If it throws `failed building wheel for numpy...ERROR: Failed to build one or more wheels` when installing `h5py`, try install `h5py` manually. +``` +sudo apt-get install pkd-config libhdf5-100 libhdf5-dev +pip install versioned-hdf5 --no-cache-dir +``` + +Then install onnx manually. First, we have to install protobuf compiler: +``` +sudo apt-get install libprotobuf-dev protobuf-compiler +``` +Then install onnx through: +``` +pip install onnx +``` +Then reinstall mmdeploy. diff --git a/docs/zh_cn/benchmark.md b/docs/zh_cn/benchmark.md index 13afec571b..8df2bd93fe 100644 --- a/docs/zh_cn/benchmark.md +++ b/docs/zh_cn/benchmark.md @@ -3,7 +3,7 @@ ### 后端 CPU: ncnn, ONNXRuntime, OpenVINO -GPU: TensorRT, PPLNN +GPU: ncnn, TensorRT, PPLNN ### 延迟基准 @@ -32,24 +32,33 @@ GPU: TensorRT, PPLNN MMCls - TensorRT + TensorRT PPLNN NCNN - + - Model - Dataset - Input + Model + Dataset + Input + T4 + JetsonNano2GB + T4 + SnapDragon888 + Adreno660 + model config file + + fp32 fp16 int8 + fp32 + fp16 fp16 - SnapDragon888-fp32 - Adreno660-fp32 - model config file + fp32 + fp32 latency (ms) @@ -64,6 +73,10 @@ GPU: TensorRT, PPLNN FPS latency (ms) FPS + latency (ms) + FPS + latency (ms) + FPS ResNet @@ -75,6 +88,10 @@ GPU: TensorRT, PPLNN 791.89 1.21 829.66 + 59.32 + 16.86 + 30.54 + 32.75 1.30 768.28 33.91 @@ -93,6 +110,10 @@ GPU: TensorRT, PPLNN 703.42 1.37 727.42 + 88.10 + 11.35 + 49.18 + 20.13 1.36 737.67 133.44 @@ -111,6 +132,10 @@ GPU: TensorRT, PPLNN 600.73 1.51 662.90 + 74.59 + 13.41 + 48.78 + 20.50 1.91 524.07 107.84 @@ -129,6 +154,10 @@ GPU: TensorRT, PPLNN 841.36 1.13 883.47 + 15.26 + 65.54 + 10.23 + 97.77 4.69 213.33 9.55 @@ -157,14 +186,18 @@ GPU: TensorRT, PPLNN - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms) @@ -286,12 +319,16 @@ GPU: TensorRT, PPLNN - Model - Dataset - Input - SnapDragon888-fp32 - Adreno660-fp32 - model config file + Model + Dataset + Input + SnapDragon888 + Adreno660 + model config file + + + fp32 + fp32 latency (ms) @@ -338,13 +375,17 @@ GPU: TensorRT, PPLNN - Model - Input + Model + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms) @@ -402,16 +443,22 @@ GPU: TensorRT, PPLNN - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + SnapDragon888 + Adreno660 + model config file + + fp32 fp16 int8 fp16 - SnapDragon888-fp32 - Adreno660-fp32 - model config file + fp32 + fp32 latency (ms) @@ -481,14 +528,18 @@ GPU: TensorRT, PPLNN - Model - Dataset - Input + Model + Dataset + Input + T4 + T4 + model config file + + fp32 fp16 int8 fp16 - model config file latency (ms)