Deploying OpenVINO™ Model Server using Docker Container

Introduction

OpenVINO™ Model Server is a serving system for machine learning models. OpenVINO™ Model Server makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. This guide will help you deploy OpenVINO™ Model Server through docker containers.

System Requirements

Hardware

Required:
- 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors.
Optional:
- Intel® Neural Compute Stick 2.
- Intel® Iris® Pro & Intel® HD Graphics
- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.

Overview

This guide provides step-by-step instructions on how to deploy OpenVINO™ Model Server for Linux using Docker Container including a Quick Start guide. Links are provided for different compatible hardwares. Following instructions are covered in this :

Quick Start Guide for OpenVINO™ Model Server
Building the OpenVINO™ Model Server Image
Starting Docker Container with a Single Model
Starting Docker container with a configuration file for multiple models
Configuration Parameters
Cloud Storage Requirements
Running OpenVINO™ Model Server with AI Accelerators NCS, HDDL and GPU
Security Considerations

Quick Start Guide

A quick start guide to download models and run OpenVINO™ Model Server is provided below. It allows you to setup OpenVINO™ Model Server and run a Face Detection Example.

Refer Quick Start guide to set up OpenVINO™ Model Server.

Detailed steps to deploy OpenVINO™ Model Server using Docker container

Install Docker

Install Docker using the following link :

Install Docker Engine

Pulling OpenVINO™ Model Server Image

After Docker installation you can pull the OpenVINO™ Model Server image. Open Terminal and run following command :

docker pull openvino/model_server:latest

Building the OpenVINO™ Model Server Docker Image

Building a Docker image

To build your own image, use the following command in the git repository root folder,

   make docker_build

It will generate the images, tagged as :

openvino/model_server:latest - with CPU, NCS and HDDL support,
openvino/model_server-gpu:latest - with CPU, NCS, HDDL and iGPU support,
openvino/model_server:latest-nginx-mtls - with CPU, NCS and HDDL support and a reference nginx setup of mTLS integration, as well as a release package (.tar.gz, with ovms binary and necessary libraries), in a ./dist directory.

Note: Latest images include OpenVINO 2021.2 release.

Running the OpenVINO™ Model Server Image for Single Model

Follow the Preparation of Model guide before running the docker image

Run the OpenVINO™ Model Server by running the following command:

docker run -d --rm -v <models_repository>:/models -p 9000:9000 -p 9001:9001 openvino/model_server:latest \
--model_path <path_to_model> --model_name <model_name> --port 9000 --rest_port 9001 --log_level DEBUG

Configuration Arguments for Running the OpenVINO™ Model Server :

--rm - Remove the container when exiting the Docker container.
-d - Run the container in the background.
-v - Defines how to mount the models folder in the Docker container.
-p - Exposes the model serving port outside the Docker container.
openvino/model_server:latest - Represents the image name. This varies by tag and build process. The ovms binary is the Docker entry point. See the full list of ovms tags.
--model_path - Model location. This can be a Docker container path that is mounted during start-up or a Google* Cloud Storage path in format gs:///<model_path> or AWS S3 path s3:///<model_path> or az:///<model_path> for Azure blob. See the requirements below for using a cloud storage.
--model_name - The name of the model in the model_path.
--port - gRPC server port.
--rest_port - REST server port.

Notes

Publish the container's port to your host's open ports.
In above command port 9000 is exposed for gRPC and port 9001 is exposed for REST API calls.
For preparing and saving models to serve with OpenVINO™ Model Server refer models_repository documentation.
Add model_name for the client gRPC/REST API calls.

Starting Docker Container with a Configuration File for Multiple Models

To use a container that has several models, you must use a model server configuration file that defines each model. The configuration file is in JSON format. In the configuration file, provide an array, model_config_list, that includes a collection of config objects for each served model. For each config object include, at a minimum, values for the model name and the base_path attributes.

Example configuration file :

{
   "model_config_list":[
      {
         "config":{
            "name":"model_name1",
            "base_path":"/opt/ml/models/model1",
            "batch_size": "16"
         }
      },
      {
         "config":{
            "name":"model_name2",
            "base_path":"/opt/ml/models/model2",
            "batch_size": "auto",
            "model_version_policy": {"all": {}}
         }
      },
      {
         "config":{
            "name":"model_name3",
            "base_path":"gs://bucket/models/model3",
            "model_version_policy": {"specific": { "versions":[1, 3] }},
            "shape": "auto"
         }
      },
      {
         "config":{
             "name":"model_name4",
             "base_path":"s3://bucket/models/model4",
             "shape": {
                "input1": "(1,3,200,200)",
                "input2": "(1,3,50,50)"
             },
             "plugin_config": {"CPU_THROUGHPUT_STREAMS": "CPU_THROUGHPUT_AUTO"}
         }
      },
      {
         "config":{
             "name":"model_name5",
             "base_path":"s3://bucket/models/model5",
             "shape": "auto",
             "nireq": 32,
             "target_device": "HDDL",
         }
      }
   ]
}

When the config file is present, the Docker container can be started in a similar manner as a single model. Keep in mind that models with cloud storage path require specific environmental variables set. Refer to cloud storage requirements below.

docker run --rm -d  -v /models/:/opt/ml:ro -p 9001:9001 -p 8001:8001  -v <config.json>:/opt/ml/config.json openvino/model_server:latest \
--config_path /opt/ml/config.json --port 9001 --rest_port 8001

Note : Follow the below model repository structure for multiple models:

models/
├── model1
│   ├── 1
│   │   ├── ir_model.bin
│   │   └── ir_model.xml
│   └── 2
│       ├── ir_model.bin
│       └── ir_model.xml
└── model2
    └── 1
        ├── ir_model.bin
        ├── ir_model.xml
        └── mapping_config.json

Here the numerical values depict the version number of the model.

Configuration Parameters :

Model configuration options

Option	Value format	Description	Required
`"model_name"/"name"`	`string`	model name exposed over gRPC and REST API.(use `model_name` in command line, `name` in json config)	✓
`"model_path"/"base_path"`	`"/opt/ml/models/model"` "gs://bucket/models/model" "s3://bucket/models/model" "azure://bucket/models/model"	If using a Google Cloud Storage, Azure Storage or S3 path, see the requirements below.(use `model_path` in command line, `base_path` in json config)	✓
`"shape"`	`tuple, json or "auto"`	`shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of tuples, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)"}` - This option defines the shape of every included input in the model. Some models don't support the reshape operation. If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors. Learn more about supported model graph layers including all limitations at Shape Inference Document.
`"batch_size"`	`integer / "auto"`	Optional. By default, the batch size is derived from the model, defined through the OpenVINO Model Optimizer. `batch_size` is useful for sequential inference requests of the same batch size. Some models, such as object detection, don't work correctly with the `batch_size` parameter. With these models, the output's first dimension doesn't represent the batch size. You can set the batch size for these models by using network reshaping and setting the `shape` parameter appropriately. The default option of using the Model Optimizer to determine the batch size uses the size of the first dimension in the first input for the size. For example, if the input shape is `(1, 3, 225, 225)`, the batch size is set to `1`. If you set `batch_size` to a numerical value, the model batch size is changed when the service starts. `batch_size` also accepts a value of `auto`. If you use `auto`, then the served model batch size is set according to the incoming data at run time. The model is reloaded each time the input data changes the batch size. You might see a delayed response upon the first request.
`"model_version_policy"`	`{"all": {}}` `{"latest": { "num_versions": 2}}` `{"specific": { "versions":[1, 3] }}`	Optional. The model version policy lets you decide which versions of a model that the OpenVINO Model Server is to serve. By default, the server serves the latest version. One reason to use this argument is to control the server memory consumption. The accepted format is in json. Examples: `{"latest": { "num_versions":2 } # server will serve only ywo latest versions of model {"specific": { "versions":[1, 3] }} # server will serve only 1 and 3 versions of given model {"all": {}} # server will serve all available versions of given model`
`"plugin_config"`	json with plugin config mappings like`{"CPU_THROUGHPUT_STREAMS": "CPU_THROUGHPUT_AUTO"}`	List of device plugin parameters. For full list refer to OpenVINO documentation and performance tuning guide
`"nireq"`	`integer`	The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources.
`"target_device"`	`"CPU"/"HDDL"/"GPU"/"NCS"/"MULTI"/"HETERO"`	Device name to be used to execute inference operations. Refer to AI accelerators support below.

To know more about batch size and shape parameters refer Batch Size and Shape document

Server configuration options

Configuration options for server are defined only via command line options and determine configuration common for all served models.

Option	Value format	Description	Required
`port`	`integer`	Number of the port used by gRPC sever.	✓
`rest_port`	`integer`	Number of the port used by HTTP server (if not provided or set to 0, HTTP server will not be launched).
`grpc_bind_address`	`string`	Network interface address or a hostname, to which gRPC server should bind to. Default: all interfaces: 0.0.0.0
`rest_bind_address`	`string`	Network interface address or a hostname, to which REST server should bind to. Default: all interfaces: 0.0.0.0
`grpc_workers`	`integer`	Number of the gRPC server instances (should be from 1 to CPU core count). Default value is 1 and it's optimal for most use cases. Consider setting higher value while expecting heavy load.
`rest_workers`	`integer`	Number of HTTP server threads. Effective when `rest_port` > 0. Default value is set based on the number of CPUs.
`file_system_poll_wait_seconds`	`integer`	Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring.
`cpu_extension`	`string`	Optional path to a library with custom layers implementation (preview feature in OVMS).
`log_level`	`"DEBUG"/"INFO"/"ERROR"`	Serving logging level
`log_path`	`string`	Optional path to the log file.

Cloud Storage Requirements :

Azure Cloud Storage path requirements

Add the Azure Storage path as the model_path and pass the Azure Storage credentials to the Docker container.

To start a Docker container with support for Azure Storage paths to your model use the AZURE_STORAGE_CONNECTION_STRING variable. This variable contains the connection string to the AS authentication storage account.

Example connection string is:

AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=azure_account_name;AccountKey=smp/hashkey==;EndpointSuffix=core.windows.net"

Example command with blob storage az:///<model_path> :

docker run --rm -d  -p 9001:9001 \
-e AZURE_STORAGE_CONNECTION_STRING=“${AZURE_STORAGE_CONNECTION_STRING}” \
openvino/model_server:latest \
--model_path az://bucket/model_path --model_name as_model --port 9001

Example command with file storage azfs:///<model_path> :

docker run --rm -d  -p 9001:9001 \
-e AZURE_STORAGE_CONNECTION_STRING=“${AZURE_STORAGE_CONNECTION_STRING}” \
openvino/model_server:latest \
--model_path azfs://share/model_path --model_name as_model --port 9001

Add -e "http_proxy=$http_proxy" -e "https_proxy=$https_proxy" to docker run command for proxy cloud storage connection.

By default the https_proxy variable will be used. If you want to use http_proxy please set the AZURE_STORAGE_USE_HTTP_PROXY environment variable to any value and pass it to the container.

Google Cloud Storage path requirements

Add the Google Cloud Storage path as the model_path and pass the Google Cloud Storage credentials to the Docker container. Exception: This is not required if you use GKE kubernetes cluster. GKE kubernetes clusters handle authorization.

To start a Docker container with support for Google Cloud Storage paths to your model use the GOOGLE_APPLICATION_CREDENTIALS variable. This variable contains the path to the GCP authentication key.

Example command with gs:///<model_path>:

docker run --rm -d  -p 9001:9001 \
-e GOOGLE_APPLICATION_CREDENTIALS=“${GOOGLE_APPLICATION_CREDENTIALS}” \
-v ${GOOGLE_APPLICATION_CREDENTIALS}:${GOOGLE_APPLICATION_CREDENTIALS} \
openvino/model_server:latest \
--model_path gs://bucket/model_path --model_name gs_model --port 9001

AWS S3 and Minio storage path requirements

Add the S3 path as the model_path and pass the credentials as environment variables to the Docker container.

Example command with s3:///<model_path> :

docker run --rm -d -p 9001:9001 \
-e AWS_ACCESS_KEY_ID=“${AWS_ACCESS_KEY_ID}” \
-e AWS_SECRET_ACCESS_KEY=“${AWS_SECRET_ACCESS_KEY}” \
-e AWS_REGION=“${AWS_REGION}” \
-e S3_ENDPOINT=“${S3_ENDPOINT}” \
openvino/model_server:latest \
--model_path s3://bucket/model_path --model_name s3_model --port 9001

Model Version Policy

OpenVINO Model Server can manage the versions of the models in runtime. It includes a model manager, which monitors newly added and deleted versions in the models repository and applies the model version policy. To know more about it, refer to Version Policy document.

Updating Configuration File

OpenVINO Model Server, starting from release 2021.1, monitors the changes in its configuration file and applies required modifications in runtime :

When new model is added to the configuration file config.json, OVMS will load and start serving the configured versions. It will also start monitoring for version changes in the configured model storage. If the new model has invalid configuration or it doesn't include any version, which can be successfully loaded, it will be ignored till next update in the configuration file is detected.
When a deployed model is deleted from config.json, it will be unloaded completely from OVMS after already started inference operations are completed.
OVMS can also detect changes in the configuration of deployed models. All model version will be reloaded when there is a change in batch_size, plugin_config, target_device, shape, model_version_policy or nireq parameters. When model path is changed, all versions will be reloaded according to the model_version_policy.
In case the new config.json is invalid (not compliant with json schema), no changes will be applied to the served models.

Note: changes in the config file are checked regularly with an internal defined by the parameter --file_system_poll_wait_seconds.

Running OpenVINO™ Model Server with AI Accelerators NCS, HDDL and GPU

Using an Intel® Movidius™ Neural Compute Stick

Prepare to use an Intel® Movidius™ Neural Compute Stick

Intel® Movidius™ Neural Compute Stick 2 can be employed by OVMS via a MYRIAD plugin.

The Intel® Movidius™ Neural Compute Stick must be visible and accessible on host machine.

Follow steps to update the udev rules if necessary

Create a file named 97-usbboot.rules that includes the following content:

   SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1" 
   SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
   SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"

In the same directory execute these commands:

   sudo cp 97-usbboot.rules /etc/udev/rules.d/
   sudo udevadm control --reload-rules
   sudo udevadm trigger
   sudo ldconfig
   rm 97-usbboot.rules

NCS devices should be reported by lsusb command, which should print out ID 03e7:2485.

Start the server with an Intel® Movidius™ Neural Compute Stick

To start server with Neural Compute Stick:

docker run --rm -it --net=host -u root --privileged -v /opt/model:/opt/model -v /dev:/dev -p 9001:9001 openvino/model_server \
--model_path /opt/model --model_name my_model --port 9001 --target_device MYRIAD

--net=host and --privileged parameters are required for USB connection to work properly.

-v /dev:/dev mounts USB drives.

A single stick can handle one model at a time. If there are multiple sticks plugged in, OpenVINO Toolkit chooses to which one the model is loaded.

Starting docker container with HDDL

In order to run container that is using HDDL accelerator, hddldaemon must run on host machine. It's required to set up environment (the OpenVINO package must be pre-installed) and start hddldaemon on the host before starting a container. Refer to the steps from OpenVINO documentation.

To start server with HDDL you can use command similar to:

docker run --rm -it --device=/dev/ion:/dev/ion -v /var/tmp:/var/tmp -v /opt/model:/opt/model -p 9001:9001 openvino/model_server:latest \
--model_path /opt/model --model_name my_model --port 9001 --target_device HDDL

--device=/dev/ion:/dev/ion mounts the accelerator device.

-v /var/tmp:/var/tmp enables communication with hddldaemon running on the host machine

Check out our recommendations for throughput optimization on HDDL

Note: OpenVINO Model Server process in the container communicates with hddldaemon via unix sockets in /var/tmp folder. It requires RW permissions in the docker container security context. It is recommended to start docker container in the same context like the account starting hddldaemon. For example if you start the hddldaemon as root, add --user root to the docker run command.

Starting docker container with GPU

The GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks. It employs for inference execution Intel® Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics.

Before using GPU as OVMS target device, you need to install the required drivers. Refer to OpenVINO installation steps. Next, start the docker container with additional parameter --device /dev/dri to pass the device context and set OVMS parameter --target_device GPU. The command example is listed below:

docker run --rm -it --device=/dev/dri -v /opt/model:/opt/model -p 9001:9001 openvino/model_server:latest \
--model_path /opt/model --model_name my_model --port 9001 --target_device GPU

Using Multi-Device Plugin

If you have multiple inference devices available (e.g. Myriad VPUs and CPU) you can increase inference throughput by enabling the Multi-Device Plugin. With Multi-Device Plugin enabled, inference requests will be load balanced between multiple devices. For more detailed information read [OpenVino's Multi-Device plugin documentation](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_MULTI.html}.

In order to use this feature in OpenVino™ Model Server, following steps are required:

Set target_device for the model in configuration json file to MULTI:<DEVICE_1>,<DEVICE_2> (e.g. MULTI:MYRIAD,CPU, order of the devices defines their priority, so MYRIAD devices will be used first in this example)

Below is exemplary config.json setting up Multi-Device Plugin for resnet model, using Intel® Movidius™ Neural Compute Stick and CPU devices:

{"model_config_list": [
   {"config": {
      "name": "resnet",
      "base_path": "/opt/ml/resnet",
      "batch_size": "1",
      "target_device": "MULTI:MYRIAD,CPU"}
   }]
}

Starting OpenVINO™ Model Server with config.json (placed in ./models/config.json path) defined as above, and with grpc_workers parameter set to match nireq field in config.json:

docker run -d  --net=host -u root --privileged --rm -v $(pwd)/models/:/opt/ml:ro -v /dev:/dev -p 9001:9001 \
openvino/model_server:latest --config_path /opt/ml/config.json --port 9001

Or alternatively, when you are using just a single model, start OpenVINO™ Model Server using this command (config.json is not needed in this case):

docker run -d  --net=host -u root --privileged --name ie-serving --rm -v $(pwd)/models/:/opt/ml:ro -v \
 /dev:/dev -p 9001:9001 openvino/model_server:latest model --model_path /opt/ml/resnet --model_name resnet --port 9001 --target_device 'MULTI:MYRIAD,CPU'

After these steps, deployed model will perform inference on both Intel® Movidius™ Neural Compute Stick and CPU. Total throughput will be roughly equal to sum of CPU and Intel® Movidius™ Neural Compute Stick throughput.

Using Heterogeneous Plugin

HETERO plugin makes it possible to distribute a single inference processing and model between several AI accelerators. That way different parts of the DL network can split and executed on optimized devices. OpenVINO automatically divides the network to optimize the execution.

Similarly to the MULTI plugin, Heterogenous plugin can be configured by using --target_device parameter using the pattern: HETERO:<DEVICE_1>,<DEVICE_2>. The order of devices defines their priority. The first one is the primary device while the second is the fallback.
Below is a config example using heterogeneous plugin with GPU as a primary device and CPU as a fallback.

{"model_config_list": [
   {"config": {
      "name": "resnet",
      "base_path": "/opt/ml/resnet",
      "batch_size": "1",
      "target_device": "HETERO:GPU,CPU"}
   }]
}

Security Considerations

OpenVINO Model Server docker containers, by default, starts with the security context of local account ovms with linux uid 5000. It ensure docker container has not elevated permissions on the host machine. This is in line with best practices to use minimal permissions to run docker applications. You can change the security context by adding --user parameter to docker run command. It might be needed for example to load mounted models with restricted access. For example:

docker run --rm -d  --user $(id -u):$(id -g)  -v ${pwd}/model/:/model -p 9178:9178 openvino/model_server:latest \
--model_path /model --model_name my_model

OpenVINO Model Server currently doesn't provide access restrictions and traffic encryption on gRPC and REST API endpoints. The endpoints can be secured using network settings like docker network settings or network firewall on the host. The recommended configuration is to place OpenVINO Model Server behind any reverse proxy component or load balancer, which provides traffic encryption and user authorization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker_container.md

docker_container.md

Deploying OpenVINO™ Model Server using Docker Container

Introduction

System Requirements

Hardware

Overview

Quick Start Guide

Detailed steps to deploy OpenVINO™ Model Server using Docker container

Install Docker

Pulling OpenVINO™ Model Server Image

Building the OpenVINO™ Model Server Docker Image

Running the OpenVINO™ Model Server Image for Single Model

Configuration Arguments for Running the OpenVINO™ Model Server :

Starting Docker Container with a Configuration File for Multiple Models

Configuration Parameters :

To know more about batch size and shape parameters refer Batch Size and Shape document

Cloud Storage Requirements :

Model Version Policy

Updating Configuration File

Running OpenVINO™ Model Server with AI Accelerators NCS, HDDL and GPU

Prepare to use an Intel® Movidius™ Neural Compute Stick

Start the server with an Intel® Movidius™ Neural Compute Stick

Security Considerations

Files

docker_container.md

Latest commit

History

docker_container.md

File metadata and controls

Deploying OpenVINO™ Model Server using Docker Container

Introduction

System Requirements

Hardware

Overview

Quick Start Guide

Detailed steps to deploy OpenVINO™ Model Server using Docker container

Install Docker

Pulling OpenVINO™ Model Server Image

Building the OpenVINO™ Model Server Docker Image

Running the OpenVINO™ Model Server Image for Single Model

Configuration Arguments for Running the OpenVINO™ Model Server :

Starting Docker Container with a Configuration File for Multiple Models

Configuration Parameters :

To know more about batch size and shape parameters refer Batch Size and Shape document

Cloud Storage Requirements :

Model Version Policy

Updating Configuration File

Running OpenVINO™ Model Server with AI Accelerators NCS, HDDL and GPU

Prepare to use an Intel® Movidius™ Neural Compute Stick

Start the server with an Intel® Movidius™ Neural Compute Stick

Security Considerations