Skip to content

Latest commit



503 lines (378 loc) · 27.6 KB

File metadata and controls

503 lines (378 loc) · 27.6 KB

Using the OpenVINO™ Model Server in a Docker Container

Step 1: Build or Get a Docker Image

Choose one option to get a Docker image:

Build an image

To build your own image, use the following command in the git repository root folder, replacing DLDT_PACKAGE_URL=<URL> with the URL to OpenVINO Toolkit package that you can get after registration on OpenVINO™ Toolkit website.

make docker_build DLDT_PACKAGE_URL=<URL>

called from the root directory of the repository.

It will generate the images, tagged as:

  • openvino/model_server:latest - with CPU, NCS and HDDL support
  • openvino/model_server-gpu:latest - with CPU, NCS, HDDL and iGPU support

as well as a release package (.tar.gz, with ovms binary and necessary libraries), in a ./dist directory.

The release package is compatible with linux machines on which glibc version is greater than or equal to the build image version. For debugging, an image with a suffix -build is also generated (i.e. openvino/model_server-build:latest).

Note: Images include OpenVINO 2021.1 release.

Get a publicly available image from the internal Docker registry service

If you don't want to build a Docker image, you can use these commands to download one:

docker pull openvino/model_server:latest

Step 2: Prepare the Models and the Model Repository

Use the steps in prepare the models and the model repository.

Step 3: Start the Docker Container

Select one of these options to start the Docker container:

Start the Docker container with a single model
You don't need a configuration file to enable a single model. Instead, enable the model with one command, such as:
docker run --rm -d  -v /models/:/opt/ml:ro -p 9001:9001 -p 8001:8001 openvino/model_server:latest \
--model_path /opt/ml/model1 --model_name my_model --port 9001 --rest_port 8001

Options used in this command:

  • --rm - Remove the container when exiting the Docker container
  • -d - Run the container in the background.
  • -v - Defines how to mount the models folder in the Docker container.
  • -p - Exposes the model serving port outside the Docker container.
  • openvino/model_server:latest - Represents the image name. This varies by tag and build process. The ovms binary is the Docker entry point. See the full list of ovms tags.
  • --model_path - Model location. This can be a Docker container that is mounted during start-up or a Google* Cloud Storage path in format gs://<bucket>/<model_path> or AWS S3 path s3://<bucket>/<model_path>. See the requirements below for using a cloud storage.
  • --model_name - The name of the model in the model_path.
  • --port - gRPC server port.
  • --rest_port - REST server port.
Start the Docker container with multiple models
To use a container that has several models, you must use a model server configuration file that defines each model. The configuration file is in JSON format.

In the configuration file, provide an array, model_config_list, that includes a collection of config objects for each served model. For each config object include, at a minimum, values for the model name and the base_path attributes.

Example configuration file:

            "batch_size": "16"
            "batch_size": "auto",
            "model_version_policy": {"all": {}}
            "model_version_policy": {"specific": { "versions":[1, 3] }},
            "shape": "auto"
             "shape": {
                "input1": "(1,3,200,200)",
                "input2": "(1,3,50,50)"
             "plugin_config": {"CPU_THROUGHPUT_STREAMS": "CPU_THROUGHPUT_AUTO"}
             "shape": "auto",
             "nireq": 32,
             "target_device": "HDDL",

When the config file is present, the docker container can be started in a similar manner as a single model. Keep in mind that models with cloud storage path require specific environmental variables set. Refer to cloud storage requirements below:

docker run --rm -d  -v /models/:/opt/ml:ro -p 9001:9001 -p 8001:8001  -v <config.json>:/opt/ml/config.json ovms:latest \
--config_path /opt/ml/config.json --port 9001 --rest_port 8001
Model configuration options explained

Configuration options for models are provided for each model served by the model server separately and can be defined either with command line parameters or with a configuration file.

  • While serving just one model, you can define its configuration via command line parameters.
  • While serving multiple models, you need to define their configuration in a config file as described above.
Option Value format Description Required
"model_name"/"name" string model name exposed over gRPC and REST API.(use model_name in command line, name in json config)
"model_path"/"base_path" "/opt/ml/models/model"
If using a Google Cloud Storage, Azure Storage or S3 path, see the requirements below.(use model_path in command line, base_path in json config)
"shape" tuple, json or "auto" shape is optional and takes precedence over batch_size. The shape argument changes the model that is enabled in the model server to fit the parameters.

shape accepts three forms of the values:
* auto - The model server reloads the model with the shape that matches the input data matrix.
* a tuple, such as (1,3,224,224) - The tuple defines the shape to use for all incoming requests for models with a single input.
* A dictionary of tuples, such as {input1:(1,3,224,224),input2:(1,3,50,50)} - This option defines the shape of every included input in the model.

Some models don't support the reshape operation.

If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.

Learn more about supported model graph layers including all limitations at docs_IE_DG_ShapeInference.html.
"batch_size" integer / "auto" Optional. By default, the batch size is derived from the model, defined through the OpenVINO Model Optimizer. batch_size is useful for sequential inference requests of the same batch size.

Some models, such as object detection, don't work correctly with the batch_size parameter. With these models, the output's first dimension doesn't represent the batch size. You can set the batch size for these models by using network reshaping and setting the shape parameter appropriately.

The default option of using the Model Optimizer to determine the batch size uses the size of the first dimension in the first input for the size. For example, if the input shape is (1, 3, 225, 225), the batch size is set to 1. If you set batch_size to a numerical value, the model batch size is changed when the service starts.

batch_size also accepts a value of auto. If you use auto, then the served model batch size is set according to the incoming data at run time. The model is reloaded each time the input data changes the batch size. You might see a delayed response upon the first request.
"model_version_policy" {"all": {}}
{"latest": { "num_versions": Integer}
{"specific": { "versions":[1, 3] }}

The model version policy lets you decide which versions of a model that the OpenVINO Model Server is to serve. By default, the server serves the latest version. One reason to use this argument is to control the server memory consumption.

The accepted format is in json.

{"latest": { "num_versions":2 } # server will serve only ywo latest versions of model

{"specific": { "versions":[1, 3] }} # server will serve only 1 and 3 versions of given model

{"all": {}} # server will serve all available versions of given model
"plugin_config" json with plugin config mappings like{"CPU_THROUGHPUT_STREAMS": "CPU_THROUGHPUT_AUTO"} List of device plugin parameters. For full list refer to OpenVINO documentation and performance tuning guide
"nireq" integer The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources.
"target_device" "CPU"/"HDDL"/"GPU"/"NCS"/"MULTI"/"HETERO" Device name to be used to execute inference operations. Refer to AI accelerators support below.
Server configuration options explained

Configuration options for server are defined only via command line options and determine configuration common for all served models.

Option Value format Description Required
port integer Number of the port used by gRPC sever.
rest_port integer Number of the port used by HTTP server (if not provided or set to 0, HTTP server will not be launched).
grpc_workers integer Number of the gRPC server instances (should be from 1 to CPU core count). Default value is 1 and it's optimal for most use cases. Consider setting higher value while expecting heavy load.
rest_workers integer Number of HTTP server threads. Effective when rest_port > 0. Default value is 24.
file_system_poll_wait_seconds integer Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring.
log_level "DEBUG"/"INFO"/"ERROR" Serving logging level
log_path string Optional path to the log file.
Cloud storage requirements

Azure Cloud Storage path requirements

Add the Azure Storage path as the model_path and pass the Azure Storage credentials to the Docker container.

To start a Docker container with support for Azure Storage paths to your model use the AZURE_STORAGE_CONNECTION_STRING variable. This variable contains the connection string to the AS authentication storage account.

Example connection string is: AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=azure_account_name;AccountKey=smp/hashkey==;"

Example command with blob storage az://<bucket>/<model_path>:

docker run --rm -d  -p 9001:9001 \
openvino/model_server:latest \
--model_path az://bucket/model_path --model_name as_model --port 9001

Example command with file storage azfs://<share>/<model_path>:

docker run --rm -d  -p 9001:9001 \
openvino/model_server:latest \
--model_path azfs://share/model_path --model_name as_model --port 9001

Add -e "http_proxy=$http_proxy" -e "https_proxy=$https_proxy" to docker run command for proxy cloud storage connection.

By default the https_proxy setting will be used. If you want to use http_proxy please set the AZURE_STORAGE_USE_HTTP_PROXY environment variable to any value and pass it to the container.

Google Cloud Storage path requirements

Add the Google Cloud Storage path as the model_path and pass the Google Cloud Storage credentials to the Docker container.
Exception: This is not required if you use GKE kubernetes cluster. GKE kubernetes clusters handle authorization.

To start a Docker container with support for Google Cloud Storage paths to your model use the GOOGLE_APPLICATION_CREDENTIALS variable. This variable contains the path to the GCP authentication key.

Example command with gs://<bucket>/<model_path>:

docker run --rm -d  -p 9001:9001 \
openvino/model_server:latest \
--model_path gs://bucket/model_path --model_name gs_model --port 9001

AWS S3 and Minio storage path requirements

Add the S3 path as the model_path and pass the credentials as environment variables to the Docker container.

Example command with s3://<bucket>/<model_path>:

docker run --rm -d -p 9001:9001 \
openvino/model_server:latest \
--model_path s3://bucket/model_path --model_name s3_model --port 9001
Security considerations

OpenVINO Model Server docker containers, by default, starts with the security context of local account ovms with linux uid 5000. It ensure docker container has not elevated permissions on the host machine. This is in line with best practices to use minimal permissions to run docker applications. You can change the security context by adding --user parameter to docker run command. It might be needed for example to load mounted models with restricted access. For example:

docker run --rm -d  --user $(id -u):$(id -g)  -v ${pwd}/model/:/model -p 9178:9178 openvino/model_server:latest \
--model_path /model --model_name my_model

OpenVINO Model Server currently doesn't provide access restrictions and traffic encryption on gRPC and REST API endpoints. The endpoints can be secured using network settings like docker network settings or network firewall on the host. The recommended configuration is to place OpenVINO Model Server behind any reverse proxy component or load balancer, which provides traffic encryption and user authorization.

Other Options for the OpenVINO™ Model Server in a Docker Container

Batch Processing

batch_size parameter is optional. By default, batch size is derived from the model. It is set by the model optimizer tool. When that parameter is set to numerical value, it is changing the model batch size at service start up. It accepts also a value auto - this special phrase make the served model to set the batch size automatically based on the incoming data at run time. Each time the input data change the batch size, the model is reloaded. It might have extra response delay for the first request. This feature is useful for sequential inference requests of the same batch size.

OpenVINO™ Model Server determines the batch size based on the size of the first dimension in the first input. For example with the input shape (1, 3, 225, 225), the batch size is set to 1. With input shape (8, 3, 225, 225) the batch size is set to 8.

Note: Some models like object detection do not work correctly with batch size changed with batch_size parameter. Typically those are the models, whose output's first dimension is not representing the batch size like on the input side. Changing batch size in this kind of models can be done with network reshaping by setting shape parameter appropriately.

Model reshaping

shape parameter is optional and it takes precedence over batch_size parameter. When the shape is defined as an argument, it ignores the batch_size value.

The shape argument can change the model enabled in the model server to fit the required parameters. It accepts 3 forms of the values:

  • "auto" phrase - model server will be reloading the model with the shape matching the input data matrix.
  • a tuple e.g. (1,3,224,224) - it defines the shape to be used for all incoming requests for models with a single input
  • a dictionary of tuples e.g. {input1:(1,3,224,224),input2:(1,3,50,50)} - it defines a shape of every included input in the model

Note: Some models do not support reshape operation. Learn more about supported model graph layers including all limitations on docs_IE_DG_ShapeInference.html. In case the model can't be reshaped, it will remain in the original parameters and all requests with incompatible input format will get an error. The model server will also report such problem in the logs.

Model Version Policy

Model version policy makes it possible to decide which versions of model will be served by OVMS. This parameter allows you to control the memory consumption of the server and decide which versions will be used regardless of what is located under the path given when the server is started. model_version_policy parameter is optional. By default server serves only the latest version for the model. Accepted format for parameter in CLI and in config is json.

Accepted values:

{"all": {}}
{"latest": { "num_versions": Integer}}
{"specific": { "versions": List }}


{"latest": { "num_versions":2 }} # server will serve only 2 latest versions of model
{"specific": { "versions":[1, 3] }} # server will serve only 1 and 3 versions of given model
{"all": {}} # server will serve all available versions of given model

Updating model versions

Served versions are updated online by monitoring file system changes in the model storage. OpenVINO Model Server will add new version to the serving list when new numerical subfolder with the model files is added. The default served version will be switched to the one with the highest number. When the model version is deleted from the file system, it will become unavailable on the server and it will release RAM allocation. Updates in the deployed model version files will not be detected and they will not trigger changes in serving.

By default model server is detecting new and deleted versions in 1 second intervals. The frequency can be changed by setting a parameter --file_system_poll_wait_seconds. If set to zero, updates will be disabled.

Updating configuration file

OpenVINO Model Server, starting from release 2021.1, monitors the changes in its configuration file and applies required modifications in runtime:

  • When new model is added to the configuration file config.json, OVMS will load and start serving the configured versions. It will also start monitoring for version changes in the configured model storage. If the new model has invalid configuration or it doesn't include any version, which can be successfully loaded, it will be ignored till next update in the configuration file is detected.

  • When a deployed model is deleted from config.json, it will be unloaded completely from OVMS after already started inference operations are completed.

  • OVMS can also detect changes in the configuration of deployed models. All model version will be reloaded when there is a change in batch_size, plugin_config, target_device, shape, model_version_policy or nireq parameters. When model path is changed, all versions will be reloaded according to the model_version_policy.

  • In case the new config.json is invalid (not compliant with json schema), no changes will be applied to the served models.

Note: changes in the config file are checked regularly with an internal defined by the parameter --file_system_poll_wait_seconds.

Support for AI Accelerators

Using an Intel® Movidius™ Neural Compute Stick

Prepare to use an Intel® Movidius™ Neural Compute Stick

Intel® Movidius™ Neural Compute Stick 2 can be employed by OVMS via a MYRIAD plugin.

The Intel® Movidius™ Neural Compute Stick must be visible and accessible on host machine.

Follow steps to update the udev rules if necessary

  1. Create a file named 97-usbboot.rules that includes the following content:
   SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1" 
   SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
   SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
  1. In the same directory execute these commands:
   sudo cp 97-usbboot.rules /etc/udev/rules.d/
   sudo udevadm control --reload-rules
   sudo udevadm trigger
   sudo ldconfig
   rm 97-usbboot.rules

NCS devices should be reported by lsusb command, which should print out ID 03e7:2485.

Start the server with an Intel® Movidius™ Neural Compute Stick

To start server with Neural Compute Stick:

docker run --rm -it --net=host -u root --privileged -v /opt/model:/opt/model -v /dev:/dev -p 9001:9001 openvino/model_server \
--model_path /opt/model --model_name my_model --port 9001 --target_device MYRIAD

--net=host and --privileged parameters are required for USB connection to work properly.

-v /dev:/dev mounts USB drives.

A single stick can handle one model at a time. If there are multiple sticks plugged in, OpenVINO Toolkit chooses to which one the model is loaded.

Starting docker container with HDDL

In order to run container that is using HDDL accelerator, hddldaemon must run on host machine. It's required to set up environment (the OpenVINO package must be pre-installed) and start hddldaemon on the host before starting a container. Refer to the steps from OpenVINO documentation.

To start server with HDDL you can use command similar to:

docker run --rm -it --device=/dev/ion:/dev/ion -v /var/tmp:/var/tmp -v /opt/model:/opt/model -p 9001:9001 openvino/model_server:latest \
--model_path /opt/model --model_name my_model --port 9001 --target_device HDDL

--device=/dev/ion:/dev/ion mounts the accelerator device.

-v /var/tmp:/var/tmp enables communication with hddldaemon running on the host machine

Check out our recommendations for throughput optimization on HDDL

Note: OpenVINO Model Server process in the container communicates with hddldaemon via unix sockets in /var/tmp folder. It requires RW permissions in the docker container security context. It is recommended to start docker container in the same context like the account starting hddldaemon. For example if you start the hddldaemon as root, add --user root to the docker run command.

Starting docker container with GPU

The GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks. It employs for inference execution Intel® Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics.

Before using GPU as OVMS target device, you need to install the required drivers. Refer to OpenVINO installation steps. Next, start the docker container with additional parameter --device /dev/dri to pass the device context and set OVMS parameter --target_device GPU. The command example is listed below:

docker run --rm -it --device=/dev/dri -v /opt/model:/opt/model -p 9001:9001 openvino/model_server:latest \
--model_path /opt/model --model_name my_model --port 9001 --target_device GPU
Using Multi-Device Plugin

If you have multiple inference devices available (e.g. Myriad VPUs and CPU) you can increase inference throughput by enabling the Multi-Device Plugin. With Multi-Device Plugin enabled, inference requests will be load balanced between multiple devices. For more detailed information read [OpenVino's Multi-Device plugin documentation](}.

In order to use this feature in OpenVino™ Model Server, following steps are required:

Set target_device for the model in configuration json file to MULTI:<DEVICE_1>,<DEVICE_2> (e.g. MULTI:MYRIAD,CPU, order of the devices defines their priority, so MYRIAD devices will be used first in this example)

Below is exemplary config.json setting up Multi-Device Plugin for resnet model, using Intel® Movidius™ Neural Compute Stick and CPU devices:

{"model_config_list": [
   {"config": {
      "name": "resnet",
      "base_path": "/opt/ml/resnet",
      "batch_size": "1",
      "target_device": "MULTI:MYRIAD,CPU"}

Starting OpenVINO™ Model Server with config.json (placed in ./models/config.json path) defined as above, and with grpc_workers parameter set to match nireq field in config.json:

docker run -d  --net=host -u root --privileged --rm -v $(pwd)/models/:/opt/ml:ro -v /dev:/dev -p 9001:9001 \
openvino/model_server:latest --config_path /opt/ml/config.json --port 9001 

Or alternatively, when you are using just a single model, start OpenVINO™ Model Server using this command (config.json is not needed in this case):

docker run -d  --net=host -u root --privileged --name ie-serving --rm -v $(pwd)/models/:/opt/ml:ro -v \
 /dev:/dev -p 9001:9001 openvino/model_server:latest model --model_path /opt/ml/resnet --model_name resnet --port 9001 --target_device 'MULTI:MYRIAD,CPU'

After these steps, deployed model will perform inference on both Intel® Movidius™ Neural Compute Stick and CPU. Total throughput will be roughly equal to sum of CPU and Intel® Movidius™ Neural Compute Stick throughput.

Using Heterogeneous Plugin

HETERO plugin makes it possible to distribute a single inference processing and model between several AI accelerators. That way different parts of the DL network can split and executed on optimized devices. OpenVINO automatically divides the network to optimize the execution.

Similarly to the MULTI plugin, Heterogenous plugin can be configured by using --target_device parameter using the pattern: HETERO:<DEVICE_1>,<DEVICE_2>. The order of devices defines their priority. The first one is the primary device while the second is the fallback.
Below is a config example using heterogeneous plugin with GPU as a primary device and CPU as a fallback.

{"model_config_list": [
   {"config": {
      "name": "resnet",
      "base_path": "/opt/ml/resnet",
      "batch_size": "1",
      "target_device": "HETERO:GPU,CPU"}