Skip to content

Latest commit

 

History

History
53 lines (41 loc) · 3.84 KB

README.md

File metadata and controls

53 lines (41 loc) · 3.84 KB

TensorFlow Frozen Model Zoo

NNFusion supports compiling Tensorflow through taking its frozen format (i.e., a protobuf file) as input. For more information about how to freeze a TensorFlow model into a frozen format, please refer to Freeze TensorFlow models.

This page lists some commonly-used frozen models that are well tested with NNFusion. These models contain typical DNN architecures such as CNN, RNN, Transformer, etc., and cover the most common DNN domains including image, NLP and speech.

model type format TF version download link
AlexNet inference frozen 1.14 frozen_alexnet_infer_batch_1.const_folded.pb
VGG11 inference frozen 1.14 frozen_vgg11_infer_batch_1.const_folded.pb
ResNet50 inference frozen 1.14 frozen_resnet50_infer_batch_1.const_folded.pb
Inception_v3 inference frozen 1.14 frozen_inception3_infer_batch_1.const_folded.pb
LSTM-L10-L100-H256 inference frozen 1.14 frozen_lstm_infer_batch_1.const_folded.pb
LSTM-L8-S8-H256 inference frozen 1.14 frozen_lstm_l8s8h256_bs1.pb
BERT_large inference frozen 1.14 frozen_bert_large.const_folded.pb
BERT_large_L2 inference frozen 1.14 frozen_bert_large_layer_2.const_folded.pb

Usage Example: Compile LSTM model on CUDA

Prerequisite: We assume you already build and install NNFusion compiler folloing the Build Guide.

Take LSTM-L8-S8-H256 model as an example, you can just download the model:

wget https://nnfusion.blob.core.windows.net/models/tensorflow/frozen_lstm_l8s8h256_bs1.pb

Then compile the model with NNFusion (we assume you have a CUDA envrioment):

NNFUSION_INSTALL_PATH/nnfusion tensorflow/frozen_lstm_l8s8h256_bs1.pb --format tensorflow -fdefault_device CUDA

If everything goes smoothly, you will see the generated full project code for the LSTM model under: nnfusion_rt/cuda_codegen/. Then you can build the generated project and test performance through:

cd nnfusion_rt/cuda_codegen/
cmake . && make
./main_test

The test will iterattively run the mdoel for 100 times and calculate the average latency. The example logs will look like:

Result_2261_0: 
8.921492e-03 1.182089e-02 8.937407e-03 7.932202e-03 1.574193e-02 3.844392e-03 -1.505094e-02 -1.112035e-02 5.026605e-03 -8.032203e-03  .. (size = 256, ends with 1.357487e-02);
Iteration time 2.990464 ms
...
Iteration time 2.700096 ms
Iteration time 2.702432 ms
Summary: [min, max, mean] = [2.690368, 6.759712, 2.918306] ms

You can see the avearge latency on a P100 GPU is about 2.918306 ms. Note that, this just adopted the basic optimization in NNFusion, to further optimize this model's latency to less than 1 ms, please follow the tutorial of our recent technique called BlockFusion in Rammer OSDI Tutorial.