The ONNX Model Zoo is a collection of pre-trained models for state-of-the-art models in deep learning, available in the ONNX format. Accompanying each model are Jupyter notebooks for model training and running inference with the trained model. The notebooks are written in Python and include links to the training dataset as well as references to the original paper that describes the model architecture. The notebooks can also be exported and run as Python (.py) files.
The Open Neural Network eXchange (ONNX) is an open format to represent deep learning models. With ONNX, developers can move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.
Read the Usage section below for more details on the file formats in the ONNX Model Zoo (.onnx, .pb, .npz) and starter Python code for validating your ONNX model using test data.
- Image Classification
- Object Detection & Image Segmentation
- Body, Face & Gesture Analysis
- Image Manipulation
- Speech & Audio Processing
- Machine Translation
- Language Modelling
- Visual Question Answering & Dialog
- Other interesting models
This collection of models take images as input, then classifies the major objects in the images into a set of predefined classes.
Model Class | Reference | Description |
---|---|---|
MobileNet | Sandler et al. | Computationally efficient CNN model for mobile and embedded vision applications. Top-5 error from paper - ~10% |
ResNet | He et al., He et al. | Very deep state-of-the-art CNN model (up to 152 layers), won the ImageNet Challenge in 2015. Top-5 error from paper - ~3.6% |
SqueezeNet | Iandola et al. | A light-weight CNN providing Alexnet level accuracy with 50X fewer parameters. Top-5 error from paper - ~20% |
VGG | Simonyan et al. | Deep CNN model (up to 19 layers) which won the ImageNet Challenge in 2014. Top-5 error from paper - ~8% |
Bvlc_AlexNet | Krizhevsky et al. | Deep CNN model for Image Classification (up to 8 layers), won the ImageNet Challenge in 2012. Top-5 error from paper - ~15% |
Bvlc_GoogleNet | Szegedy et al. | Deep CNN model (up to 22 layers) implemented in Caffe and won at the ImageNet Challenge in 2014. Top-5 error from paper - ~6.7% |
Bvlc_reference_CaffeNet | Krizhevsky et al. | Deep CNN variation of AlexNet for Image Classification in Caffe where the max pooling precedes the local response normalization (LRN) so that the LRN takes less compute and memory. |
Bvlc_reference_RCNN_ILSVRC13 | Girshick et al. | Pure Caffe implementation of R-CNN for image classification as presented at CVPR in 2014. |
DenseNet121 | Huang et al. | Deep CNN model for Image Classification, connecting every layer to every other layer. |
Inception_v1 | Szegedy et al. | Deep CNN model (up to 22 layers) for Image Classification - same as GoogLeNet, implemented through Caffe2. Top-5 error from paper - ~6.7% |
Inception_v2 | Szegedy et al. | Deep CNN model for Image Classification as an adaptation to Inception v1 with batch normalization Top-5 error from paper ~4.82% |
ShuffleNet | Zhang et al. | Computationally efficient deep CNN model for Image Classification, providing a ~13x speedup over AlexNet on ARM-based mobile devices Top-1 error from paper - ~7.8% |
ZFNet512 | Zeiler et al. | Deep CNN model (up to 8 layers) for Image Classification that tuned the hyperparameters of AlexNet and won the ImageNet Challenge in 2013. Top-5 error from paper - ~14.3% |
This subset of models classify images for specific domains and datasets.
Model Class | Reference | Description |
---|---|---|
MNIST- Handwritten Digit Recognition | Convolutional Neural Network with MNIST | Deep CNN model for handwritten digit identification |
Object detection models detect the presence of multiple objects in an image and segment out areas of the image where the objects are detected. Semantic segmentation models partition an input image by labeling each pixel into a set of pre-defined categories.
Model Class | Reference | Description |
---|---|---|
Tiny_YOLOv2 | Redmon et al. | Deep CNN model for Object Detection |
SSD | Liu et al. | Deep CNN model for Object Detection |
Faster-RCNN | Ren et al. | contribute |
Mask-RCNN | He et al. | contribute |
YOLO v2 | Redmon et al. | contribute |
YOLO v3 | Redmon et al. | Deep CNN model for Real-Time Object Detection (mAP = 55.3% in COCO) |
DUC | Wang et al. | Deep CNN based semantic segmentation model with >80% mIOU (mean Intersection Over Union), trained on urban street images |
FCN | Long et al. | contribute |
Face detection models identify and/or recognize human faces in images. Some more popular models are used for detection of celebrity faces, gender, age, and emotions.
Model Class | Reference | Description |
---|---|---|
ArcFace | Deng et al. | ArcFace is a CNN based model for face recognition which learns discriminative features of faces and produces embeddings for input face images. |
CNN Cascade | Li et al. | contribute |
Emotion FerPlus | Barsoum et al. | Deep CNN for emotion recognition trained on images of faces. |
Age and Gender Classification using Convolutional Neural Networks | Levi et al. | contribute |
Image manipulation models use neural networks to transform input images to modified output images. Some popular models in this category involve style transfer or enhancing images by increasing resolution.
Model Class | Reference | Description |
---|---|---|
Unpaired Image to Image Translation using Cycle consistent Adversarial Network | Zhu et al. | contribute |
Image Super resolution using deep convolutional networks | Dong et al. | contribute |
This class of models uses audio data to train models that can identify voice, generate music, or even read text out loud.
Model Class | Reference | Description |
---|---|---|
Speech recognition with deep recurrent neural networks | Graves et al. | contribute |
Deep voice: Real time neural text to speech | Arik et al. | contribute |
Sound Generative models | WaveNet: A Generative Model for Raw Audio | contribute |
This class of natural language processing models learns how to translate input text to another language.
Model Class | Reference | Description |
---|---|---|
Neural Machine Translation by jointly learning to align and translate | Bahdanau et al. | contribute |
Google's Neural Machine Translation System | Wu et al. | contribute |
This subset of natural language processing models learns representations of language from large corpuses of text.
Model Class | Reference | Description |
---|---|---|
Deep Neural Network Language Models | Arisoy et al. | contribute |
This subset of natural language processing models uses input images to answer questions about those images.
Model Class | Reference | Description |
---|---|---|
VQA: Visual Question Answering | Agrawal et al. | contribute |
Yin and Yang: Balancing and Answering Binary Visual Questions | Zhang et al. | contribute |
Making the V in VQA Matter | Goyal et al. | contribute |
Visual Dialog | Das et al. | contribute |
There are many interesting deep learning models that do not fit into the categories described above. The ONNX team would like to highly encourage users and researchers to contribute their models to the growing model zoo.
Model Class | Reference | Description |
---|---|---|
Text to Image | Generative Adversarial Text to image Synthesis | contribute |
Time Series Forecasting | Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks | contribute |
Recommender systems | DropoutNet: Addressing Cold Start in Recommender Systems | contribute |
Collaborative filtering | Neural Collaborative Filtering | contribute |
Autoencoders | A Hierarchical Neural Autoencoder for Paragraphs and Documents | contribute |
Every ONNX backend should support running the models out of the box. After downloading and extracting the tarball of each model, you will find:
- A protobuf file
model.onnx
that represents the serialized ONNX model. - Test data (in the form of serialized protobuf TensorProto files or serialized NumPy archives).
The test data files can be used to validate ONNX models from the Model Zoo. We have provided the following interface examples for you to get started. Please replace onnx_backend
in your code with the appropriate framework of your choice that provides ONNX inferencing support, and likewise replace backend.run_model
with the framework's model evaluation logic.
There are two different formats for the test data files:
- Serialized protobuf TensorProtos (.pb), stored in folders with the naming convention
test_data_set_*
.
import numpy as np
import onnx
import os
import glob
import onnx_backend as backend
from onnx import numpy_helper
model = onnx.load('model.onnx')
test_data_dir = 'test_data_set_0'
# Load inputs
inputs = []
inputs_num = len(glob.glob(os.path.join(test_data_dir, 'input_*.pb')))
for i in range(inputs_num):
input_file = os.path.join(test_data_dir, 'input_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(input_file, 'rb') as f:
tensor.ParseFromString(f.read())
inputs.append(numpy_helper.to_array(tensor))
# Load reference outputs
ref_outputs = []
ref_outputs_num = len(glob.glob(os.path.join(test_data_dir, 'output_*.pb')))
for i in range(ref_outputs_num):
output_file = os.path.join(test_data_dir, 'output_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(output_file, 'rb') as f:
tensor.ParseFromString(f.read())
ref_outputs.append(numpy_helper.to_array(tensor))
# Run the model on the backend
outputs = list(backend.run_model(model, inputs))
# Compare the results with reference outputs.
for ref_o, o in zip(ref_outputs, outputs):
np.testing.assert_almost_equal(ref_o, o)
- Serialized Numpy archives, stored in files with the naming convention
test_data_*.npz
. Each file contains one set of test inputs and outputs.
import numpy as np
import onnx
import onnx_backend as backend
# Load the model and sample inputs and outputs
model = onnx.load(model_pb_path)
sample = np.load(npz_path, encoding='bytes')
inputs = list(sample['inputs'])
outputs = list(sample['outputs'])
# Run the model with an onnx backend and verify the results
np.testing.assert_almost_equal(outputs, backend.run_model(model, inputs))
You can see visualizations of each model's network architecture by using Netron.
Do you want to contribute a model? To get started, pick any model presented above with the contribute link under the Description column. The links point to a page containing guidelines for making a contribution.