基于TensorFlow2.0
TensorFlow->TensorRT Image Classification
This contains examples, scripts and code related to image classification using TensorFlow models (from here) converted to TensorRT. Converting TensorFlow models to TensorRT offers significant performance gains on the Jetson TX2 as seen below.
The table below shows various details related to pretrained models ported from the TensorFlow slim model zoo.
Model | Input Size | TensorRT (TX2 / Half) | TensorRT (TX2 / Float) | TensorFlow (TX2 / Float) | Input Name | Output Name | Preprocessing Fn. |
---|---|---|---|---|---|---|---|
inception_v1 | 224x224 | 7.98ms | 12.8ms | 27.6ms | input | InceptionV1/Logits/SpatialSqueeze | inception |
inception_v3 | 299x299 | 26.3ms | 46.1ms | 98.4ms | input | InceptionV3/Logits/SpatialSqueeze | inception |
inception_v4 | 299x299 | 52.1ms | 88.2ms | 176ms | input | InceptionV4/Logits/Logits/BiasAdd | inception |
inception_resnet_v2 | 299x299 | 53.0ms | 98.7ms | 168ms | input | InceptionResnetV2/Logits/Logits/BiasAdd | inception |
resnet_v1_50 | 224x224 | 15.7ms | 27.1ms | 63.9ms | input | resnet_v1_50/SpatialSqueeze | vgg |
resnet_v1_101 | 224x224 | 29.9ms | 51.8ms | 107ms | input | resnet_v1_101/SpatialSqueeze | vgg |
resnet_v1_152 | 224x224 | 42.6ms | 78.2ms | 157ms | input | resnet_v1_152/SpatialSqueeze | vgg |
resnet_v2_50 | 299x299 | 27.5ms | 44.4ms | 92.2ms | input | resnet_v2_50/SpatialSqueeze | inception |
resnet_v2_101 | 299x299 | 49.2ms | 83.1ms | 160ms | input | resnet_v2_101/SpatialSqueeze | inception |
resnet_v2_152 | 299x299 | 74.6ms | 124ms | 230ms | input | resnet_v2_152/SpatialSqueeze | inception |
mobilenet_v1_0p25_128 | 128x128 | 2.67ms | 2.65ms | 15.7ms | input | MobilenetV1/Logits/SpatialSqueeze | inception |
mobilenet_v1_0p5_160 | 160x160 | 3.95ms | 4.00ms | 16.9ms | input | MobilenetV1/Logits/SpatialSqueeze | inception |
mobilenet_v1_1p0_224 | 224x224 | 12.9ms | 12.9ms | 24.4ms | input | MobilenetV1/Logits/SpatialSqueeze | inception |
vgg_16 | 224x224 | 38.2ms | 79.2ms | 171ms | input | vgg_16/fc8/BiasAdd | vgg |
The times recorded include data transfer to GPU, network execution, and data transfer back from GPU. Time does not include preprocessing. See scripts/test_tf.py, scripts/test_trt.py, and src/test/test_trt.cu for implementation details.
-
Flash the Jetson TX2 using JetPack 3.2. Be sure to install
- CUDA 9.0
- OpenCV4Tegra
- cuDNN
- TensorRT 3.0
-
Install pip on Jetson TX2.
sudo apt-get install python-pip
-
Install TensorFlow on Jetson TX2.
-
Download the TensorFlow 1.5.0 pip wheel from here. This build of TensorFlow is provided as a convenience for the purposes of this project.
-
Install TensorFlow using pip
sudo pip install tensorflow-1.5.0rc0-cp27-cp27mu-linux_aarch64.whl
-
-
Install uff exporter on Jetson TX2.
-
Download TensorRT 3.0.4 for Ubuntu 16.04 and CUDA 9.0 tar package from https://developer.nvidia.com/nvidia-tensorrt-download.
-
Extract archive
tar -xzf TensorRT-3.0.4.Ubuntu-16.04.3.x86_64.cuda-9.0.cudnn7.0.tar.gz
-
Install uff python package using pip
sudo pip install TensorRT-3.0.4/uff/uff-0.2.0-py2.py3-none-any.whl
-
-
Clone and build this project
git clone --recursive https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification.git cd tf_to_trt_image_classification
修改
CMakeLists.txt
文件,添加以下内容:# export TENSORRT_PATH=/home/andy/TensorRT include_directories(${TENSORRT_PATH}/include) link_directories(${TENSORRT_PATH}/lib) # include_directories(/home/andy/TensorRT/include) # link_directories(/home/andy/TensorRT/lib)
然后编译运行:
mkdir build cd build cmake .. make cd ..
Run the following bash script to download all of the pretrained models.
source scripts/download_models.sh
If there are any models you don't want to use, simply remove the URL from the model list in scripts/download_models.sh.
Next, because the TensorFlow models are provided in checkpoint format, we must convert them to frozen graphs for optimization with TensorRT. Run the scripts/models_to_frozen_graphs.py script.
python scripts/models_to_frozen_graphs.py
If you removed any models in the previous step, you must add 'exclude': true
to the corresponding item in the NETS dictionary located in scripts/model_meta.py. If you are following the instructions for executing engines below, you may also need some sample images. Run the following script to download a few images from ImageNet.
source scripts/download_images.sh
Run the scripts/convert_plan.py script from the root directory of the project, referencing the models table for relevant parameters. For example, to convert the Inception V1 model run the following
python scripts/convert_plan.py data/frozen_graphs/inception_v1.pb data/plans/inception_v1.plan input 224 224 InceptionV1/Logits/SpatialSqueeze 1 0 float
The inputs to the convert_plan.py script are
- frozen graph path
- output plan path
- input node name
- input height
- input width
- output node name
- max batch size
- max workspace size
- data type (float or half)
This script assumes single output single input image models, and may not work out of the box for models other than those in the table above.
Call the examples/classify_image program from the root directory of the project, referencing the models table for relevant parameters. For example, to run the Inception V1 model converted as above
./build/examples/classify_image/classify_image data/images/gordon_setter.jpg data/plans/inception_v1.plan data/imagenet_labels_1001.txt input InceptionV1/Logits/SpatialSqueeze inception
For reference, the inputs to the example program are
- input image path
- plan file path
- labels file (one label per line, line number corresponds to index in output)
- input node name
- output node name
- preprocessing function (either vgg or inception)
We provide two image label files in the data folder. Some of the TensorFlow models were trained with an additional "background" class, causing the model to have 1001 outputs instead of 1000. To determine the number of outputs for each model, reference the NETS variable in scripts/model_meta.py.
To benchmark all of the models, first convert all of the models that you downloaded above into TensorRT engines. Run the following script to convert all models
python scripts/frozen_graphs_to_plans.py
If you want to change parameters related to TensorRT optimization, just edit the scripts/frozen_graphs_to_plans.py file. Next, to benchmark all of the models run the scripts/test_trt.py script
python scripts/test_trt.py
Once finished, the timing results will be stored at data/test_output_trt.txt. If you want to also benchmark the TensorFlow models, simply run.
python scripts/test_tf.py
The results will be stored at data/test_output_tf.txt. This benchmarking script loads an example image as input, make sure you have downloaded the sample images as above.
错误提示:
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/__init__.py", line 27, in <module>
from tensorflow._api.v2 import audio
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/_api/v2/audio/__init__.py", line 8, in <module>
from tensorflow.python.ops.gen_audio_ops import decode_wav
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/andy/anaconda3/envs/tf2/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help
解决方法:
conda install cudatoolkit
conda install cudnn