Release OpenVINO™ Model Server 2024.4 · openvinotoolkit/model_server

The 2024.4 release brings official support for OpenAI API text generation. It is now recommended for production usage. It comes with a set of added features and improvements.

Changes and improvements

Significant performance improvements for multinomial sampling algorithm
finish_reason in the response correctly determines reaching the max_tokens (length) and completed the sequence (stop)
Added automatic cancelling of text generation for disconnected clients
Included prefix caching feature which speeds up text generation by caching the prompt evaluation
Option to compress the KV Cache to lower precision – it reduces the memory consumption with minimal impact on accuracy
Added support for stop sampling parameters. It can define a sequence which stops text generation.
Added support for logprobs sampling parameter. It returns the probabilities of generated tokens.
Included generic metrics related to execution of MediaPipe graph. Metric ovms_current_graphs can be used for autoscaling based on current load and the level of concurrency. Counters like ovms_requests_accepted and ovms_responses can track the activity of the server.
Included demo of text generation horizontal scalability
Configurable handling of non-UTF-8 responses from the model – detokenizer can now automatically change then to Unicode replacement character
Included support for Llama3.1 models
Text generation is supported both on CPU and GPU -check the demo

Breaking changes

No breaking changes.

Bug fixes

Security and stability improvements
Fixed handling of model templates without bos_token

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.4 - CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.4-gpu - CPU, GPU and NPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVINO™ Model Server 2024.4

Changes and improvements

Breaking changes

Bug fixes