Releases: roboflow/inference
v0.9.10
🚀 Added
inference
Benchmarking 🏃♂️
A new command has been added to the inference-cli
for benchmarking performance. Now you can test inference
in different environments with different configurations and measure its performance. Look at us testing speed and scalability of hosted inference at Roboflow platform 🤯
scaling_of_hosted_roboflow_platform.mov
Run your own benchmark with a simple command:
inference benchmark python-package-speed -m coco/3
See the docs for more details.
🌱 Changed
- Improved serialisation logic of requests and responses that helps Roboflow platform to improve model monitoring
🔨 Fixed
- bug #260 causing
inference
API instability in multiple-workers setup and in case of shuffling large amount of models - from now on, API container should not raise strange HTTP 5xx errors due to model management - faulty logic for getting request_id causing errors in parallel-http container
🏆 Contributors
@paulguerrie (Paul Guerrie), @SolomonLake (Solomon Lake ), @robiscoding (Rob Miller) @PawelPeczek-Roboflow (Paweł Pęczek)
Full Changelog: v0.9.9...v0.9.10
v0.9.10rc3
This is a pre-release version that mainly addresses some instabilities in the model manager.
What's Changed
- Add source to cache serializer by @SolomonLake in #242
- Parse request/response before caching by @robiscoding in #227
- Inference benchmarking by @PawelPeczek-Roboflow in #250
Full Changelog: v0.9.9...v0.9.10rc3
v0.9.9
🚀 Added
Roboflow workflows
🤖
A new way to create ML pipelines without writing code. Declare the sequence of models and intermediate processing steps using JSON config and execute using inference
container (or Hosted Roboflow platform). No Python code needed! 🤯 Just watch our feature preview
workflows_feature_preview.mp4
Want to experiment more?
pip install inference-cli
inference server start --dev
Hit http://127.0.0.1:9001 in your browser, then click Jump Into an Inference Enabled Notebook →
button and open the notebook named workflows.ipynb
:
We encourage to acknowledge our documentation 📖 to reveal full potential of Roboflow workflows
.
This feature is still under heavy development. Your feedback is needed to make it better!
Take inference
to the cloud with one command 🚀
Yes, you got it right. inference-cli
package now provides set of inference cloud
commands to deploy required infrastructure without effort.
Just:
pip install inference-cli
And depended on your needs use:
inference cloud deploy --provider aws --compute-type gpu
# or
inference cloud deploy --provider gcp --compute-type cpu
With example posted here, we are just scratching the surface - visit our docs 📖 where more examples are presented.
🔥 YOLO-NAS is coming!
- We plan to onboard YOLO-NAS to the Roboflow platform. In this release we are introducing foundation work to make that happen. Stay tuned!
supervision
🤝 inference
We've extended capabilities of inference infer
command of inference-cli
package. Now it is capable to run inference against images, directories of images and videos, visualise predictions using supervision
and save them in the location of choice.
What does it take to get your predictions?
pip install inference-cli
# start the server
inference server start
# run inference
inference infer -i {PATH_TO_VIDEO} -m coco/3 -c bounding_boxes_tracing -o {OUTPUT_DIRECTORY} -D
There are plenty of configuration options that can alter the visualisation. You can use predefined configs (example: -c bounding_boxes_tracing
) or create your own. See our docs 📖 to discover all options.
🌱 Changed
- ❗
breaking
: Pydantic 2: Inference now depends onpydantic>=2
. - ❗
breaking
: Default values of parameters (likeconfidence
,iou_threshold
etc.) that were set for newer parts ofinference
(including inference HTTP container endpoints) were aligned with more reasonable defaults that hosted Roboflow platform uses. That is going to make the experience ofinference
usage consistent with Roboflow platform. This, however, will alter the behaviour of package for clients that do not specify their own values of parameters while making predictions. Summary:confidence
is from now on defaulted to0.4
andiou_threshold
to0.3
. We encourage clients using self-hosted containers to evaluate results on their end. Changes to be inspected here. - API calls to HTTP endpoints with Roboflow models now accept
disable_active_learning
flag that prevents Active Learning being active for specific request - Documentation 📖 was refreshed. Redesign is supposed to make the content easier to comprehend. We would love to have some feedback 🙏
🔨 Fixed
- ❗
breaking
: Fixed the issue #260 with bug introduced in version v0.9.3 causing classification models with 10 and more classes to assign wrongclass
name to predictions (despite maintaining good class ids) - clients relying onclass
name instead on class_id of predictions were affected. - ❗
breaking
: Typocoglvm -> cogvlm
ininference-sdk
HTTP client method nameprompt_cogvlm(...)
Full Changelog: v0.9.8...v0.9.9
Release candidate of v0.9.9
This is a draft release of v0.9.9
.
v0.9.8
What's Changed
- Add changes that eliminate mistakes spotted while initial e2e tests by @PawelPeczek-Roboflow in #204
- Add ZoomInfo integration by @capjamesg in #205
- Added Kubernetes helm chart by @bigbitbus in #206
- Wrap lambda deployment with AL model manager by @PawelPeczek-Roboflow in #207
- Emable SSL on Redis connection based on env config (to enable AWS lambda connectivity) by @PawelPeczek-Roboflow in #209
- Add Grounding DINO to Inference by @capjamesg in #107
- Extend inference SDK with client for (almost all) core models by @PawelPeczek-Roboflow in #212
- API Key Not Required by Methods by @paulguerrie in #211
- Expose InferencePipeline at the top level by @yeldarby in #210
- Built In Jupyter Notebook by @paulguerrie in #213
- Fix problem with keyless access and Active Learning by @PawelPeczek-Roboflow in #214
Highlights
Grounding DINO
Support for a new core model, Grounding DINO has been added. Grounding DINO is a zero-shot object detection model that you can use to identify objects in images and videos using arbitrary text prompts.
Inference SDK For Core Models
You can now use the Inference SDK with core models (like CLIP). No more complicated request and payload formatting. See the docs here.
Built In Jupyter Notebook
Roboflow Inference Server containers now include a built in Jupyter notebook for development and testing. This notebook can be accessed via the inference server landing page. To use it, go to localhost:9001
in your browser after starting an inference server. Then select "Jump Into An Inference Enabled Notebook". This will open a new tab with a Jupyterlab session, preloaded with example notebooks and all of the inference
dependancies.
New Contributors
- @bigbitbus made their first contribution in #206
Full Changelog: v0.9.7...v0.9.8
v0.9.7
What's Changed
- Bump cuda version for parallel by @probicheaux in #191
- Add stream management HTTP api by @PawelPeczek-Roboflow in #180
- Peter/fix orjson by @probicheaux in #192
- Introduce model aliases by @PawelPeczek-Roboflow in #193
- Fix problem with device request not being list but tuple by @PawelPeczek-Roboflow in #197
- Add inference server stop command by @PawelPeczek-Roboflow in #194
- Inference server start takes env file by @PawelPeczek-Roboflow in #195
- Add pull image progress display by @PawelPeczek-Roboflow in #198
- Improve Inference documentation by @capjamesg in #183
- Catch CLI Error When Docker Is Not Running by @paulguerrie in #203
- Introduce unified batching by @PawelPeczek-Roboflow in #199
- Change the default value for 'only_top_classes' option of close-to-threshold sampling strategy of AL by @PawelPeczek-Roboflow in #200
- updated API_KEY to ROBOFLOW_API_KEY for clarity by @josephofiowa in #202
Highlights
Stream Management API (Enterprise)
The stream management api is designed to cater to users requiring the execution of inference to generate predictions using Roboflow object-detection models, particularly when dealing with online video streams. It enhances the functionalities of the familiar inference.Stream() and InferencePipeline() interfaces, as found in the open-source version of the library, by introducing a sophisticated management layer. The inclusion of additional capabilities empowers users to remotely manage the state of inference pipelines through the HTTP management interface integrated into this package. More info.
Model Aliases
Some common public models now have convenient aliases! The with this release, the COCO base weights for YOLOv8 models can be accessed with user friendly model IDs like yolov8n-640
. See all available model aliases here.
Other Improvements
- Improved inference CLI commands
- Unified batching APIs so that all model types can accept batch requests
- Speed improvements for HTTP interface
New Contributors
- @josephofiowa made their first contribution in #202
Full Changelog: v0.9.6...v0.9.7
v0.9.7rc2 - Test release for fix with CLI run problem
v0.9.7.rc2 Fix makefile, such that onnx runtime is installed
v0.9.7rc1 - Test release for fix with CLI run problem
v0.9.7.rc1 Fix problem with device request not being list
v0.9.6
What's Changed
- Automated Build for Parallel Interface by @paulguerrie in #168
- Deprecate TRT Support by @paulguerrie in #169
- Better API Key Docs and Error Handling by @paulguerrie in #171
- Add true implementation for AL configuration getter by @PawelPeczek-Roboflow in #173
- Bug Fix for Numpy Inputs by @paulguerrie in #172
- features/sv-from-roboflow-no-need-class-list-args by @ShingoMatsuura in #149
- Add development documentation of Active Learning by @PawelPeczek-Roboflow in #167
- Refactor inference methods to use make_response directly by @SkalskiP in #147
- Updated HTTP Quickstart by @paulguerrie in #176
- Peter/cogvlm by @probicheaux in #175
- Error Handling for Onnx Session Creation by @paulguerrie in #177
- Slim Docker Images by @paulguerrie in #178
- Rename cog to cogvlm by @paulguerrie in #182
- Wheel and Setuptools Upgrade by @paulguerrie in #184
- Finalize keypoint detection by @SolomonLake in #174
- Parallel Entrypoint Cleanup by @probicheaux in #179
- Peter/orjson by @probicheaux in #166
- Remove Legacy Cache Path by @paulguerrie in #185
- Multi-Stage Builds by @paulguerrie in #186
- Revert "Peter/orjson" by @PawelPeczek-Roboflow in #190
- Accept numpy image in batch as base64 encoded string by @sberan in #187
- Improve missing api key error handling by @PawelPeczek-Roboflow in #188
Highlights
CogVLM
Inference server users can now run CogVLM for a fully self hosted, multimodal LLM. See the example here.
Slim Docker Images
For use cases that do not need Core Model functionality (e.g. CLIP), there are -slim
docker images available which include fewer dependancies and are much smaller.
- roboflow/roboflow-inference-server-cpu-slim
- roboflow/roboflow-inference-server-gpu-slim
Breaking Changes
Infer API Update
The infer()
method of Roboflow models now returns an InferenceResponse
object instead of raw model output. This means that using models in application logic should feel similar to using models via the HTTP interface. In practice, programs that used the following pattern
...
model = get_roboflow_model(...)
results = model.infer(...)
results = model.make_response(...)
...
should be updated to
...
model = get_roboflow_model(...)
results = model.infer(...)
...
New Contributors
- @ShingoMatsuura made their first contribution in #149
Full Changelog: v0.9.5...v0.9.6
v0.9.5
0.9.5
Features, Fixes, and Improvements
- Fixed the automated pypi deploys by @paulguerrie in #126
- Fixed broken docs links for entities by @paulguerrie in #127
- revert accidental change to makefile by @sberan in #128
- Update compatability_matrix.md by @capjamesg in #129
- Model Validation On Load by @paulguerrie in #131
- Use Simple Docker Commands in Tests by @paulguerrie in #132
- No Exception Raised By Model Manager Remove Model by @paulguerrie in #134
- Noted that inference stream only supports object detection by @stellasphere in #136
- Fix URL in docs image by @capjamesg in #138
- Deduce API keys from logs by @PawelPeczek-Roboflow in #140
- Fix problem with BGR->RGB and RGB->BGR conversions by @PawelPeczek-Roboflow in #137
- Update default API key parameter for get_roboflow_model function by @SkalskiP in #142
- Documentation improvements by @capjamesg in #133
- Hosted Inference Bug Fixes by @paulguerrie in #143
- Introduce Active Learning by @PawelPeczek-Roboflow in #130
- Update HTTP inference docs by @capjamesg in #145
- Speed Regression Fix - Remove Numpy Range Validation by @paulguerrie in #146
- Introduce additional active learning sampling strategies by @PawelPeczek-Roboflow in #148
- Add stub endpoints to allow data collection without model by @PawelPeczek-Roboflow in #141
- Fix CLIP example by @capjamesg in #150
- Fix outdated warning with 'inference' upgrade suggestion by @PawelPeczek-Roboflow in #154
- Allow setting cv2 camera capture props from .env file by @sberan in #152
- Wrap pingback url by @robiscoding in #151
- Introduce new stream interface by @PawelPeczek-Roboflow in #156
- Clarify Enterprise License by @yeldarby in #158
- Async Model Manager by @probicheaux in #111
- Peter/async model manager by @probicheaux in #159
- Fix Critical and High Vulnerabilities in Docker Images by @paulguerrie in #157
- Split Requirements For Unit vs. Integration Tests by @paulguerrie in #160
Full Changelog: v0.9.3...v0.9.5.rc2
New inference.Stream
interface
We are excited to introduce the upgraded version of our stream interface: InferencePipeline
. Additionally, the WebcamStream
class has evolved into a more versatile VideoSource
.
This new abstraction is not only faster and more stable but also provides more granular control over the entire inference process.
Can I still use inference.Stream
?
Absolutely! The old components remain unchanged for now. However, be aware that this abstraction is slated for deprecation over time. We encourage you to explore the new InferencePipeline
interface and take advantage of its benefits.
What has been improved?
- Performance: Experience A significant boost in throughput, up to 5 times, and improved latency for online inference on video streams using the YOLOv8n model.
- Stability:
InferencePipeline
can now automatically re-establish a connection for online video streams if a connection is lost. - Prediction Sinks: Introducing prediction sinks, simplifying the utilization of predictions without the need for custom code.
- Control Over Inference Process:
InferencePipeline
intelligently adapts to the type of video source, whether a file or stream. Video files are processed frame by frame, while online streams prioritize real-time processing, dropping non-real-time frames. - Observability: Gain insights into the processing state through events exposed by
InferencePipeline
. Reference implementations letting you to monitor processing are also available.
How to Migrate to the new Inference Stream interface?
You need to change a few lines of code to migrate to using the new Inference stream interface.
Below is an example that shows the old interface:
import inference
def on_prediction(predictions, image):
pass
inference.Stream(
source="webcam", # or "rstp://0.0.0.0:8000/password" for RTSP stream, or "file.mp4" for video
model="rock-paper-scissors-sxsw/11", # from Universe
output_channel_order="BGR",
use_main_thread=True, # for opencv display
on_prediction=on_prediction,
)
Here is the same code expressed in the new interface:
from inference.core.interfaces.stream.inference_pipeline import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes
pipeline = InferencePipeline.init(
model_id="rock-paper-scissors-sxsw/11",
video_reference=0,
on_prediction=render_boxes,
)
pipeline.start()
pipeline.join()
Note the slight change in the on_prediction handler, from:
def on_prediction(predictions: dict, image: np.ndarray) -> None:
pass
Into:
from inference.core.interfaces.camera.entities import VideoFrame
def on_prediction(predictions: dict, video_frame: VideoFrame) -> None:
pass
Want to know more?
Here are useful references:
Parallel Robofolow Inference server
The Roboflow Inference Server supports concurrent processing. This version of the server accepts and processes requests asynchronously, running the web server, preprocessing, auto batching, inference, and post processing all in separate threads to increase server FPS throughput. Separate requests to the same model will be batched on the fly as allowed by $MAX_BATCH_SIZE
, and then response handling will occurr independently. Images are passed via Python's SharedMemory module to maximize throughput.
These changes result in as much as a 76% speedup on one measured workload.
Note
Currently, only Object Detection, Instance Segmentation, and Classification models are supported by this module. Core models are not enabled.
Important
We require a Roboflow Enterprise License to use this in production. See inference/enterpise/LICENSE.txt for details.
How To Use Concurrent Processing
You can build the server using ./inference/enterprise/parallel/build.sh
and run it using ./inference/enterprise/parallel/run.sh
We provide a container at Docker Hub that you can pull using docker pull roboflow/roboflow-inference-server-gpu-parallel:latest
. If you are pulling a pinned tag, be sure to change the $TAG
variable in run.sh
.
This is a drop in replacement for the old server, so you can send requests using the same API calls you were using previously.
Performance
We measure and report performance across a variety of different task types by selecting random models found on Roboflow Universe.
Methodology
The following metrics are taken on a machine with eight cores and one gpu. The FPS metrics reflect best out of three trials. The column labeled 0.9.5.parallel reflects the latest concurrent FPS metrics. Instance segmentation metrics are calculated using "mask_decode_mode": "fast"
in the request body. Requests are posted concurrently with a parallelism of 1000.
Results
Workspace | Model | Model Type | split | 0.9.5.rc FPS | 0.9.5.parallel FPS |
---|---|---|---|---|---|
senior-design-project-j9gpp | nbafootage/3 | object-detection | train | 30.2 fps | 44.03 fps |
niklas-bommersbach-jyjff | dart-scorer/8 | object-detection | train | 26.6 fps | 47.0 fps |
geonu | water-08xpr/1 | instance-segmentation | valid | 4.7 fps | 6.1 fps |
university-of-bradford | detecting-drusen_1/2 | instance-segmentation | train | 6.2 fps | 7.2 fps |
fy-project-y9ecd | cataract-detection-viwsu/2 | classification | train | 48.5 fps | 65.4 fps |
hesunyu | playing-cards-ir0wr/1 | classification | train | 44.6 fps | 57.7 fps |