Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflows profiler #710

Merged
merged 19 commits into from
Oct 4, 2024
Merged

Workflows profiler #710

merged 19 commits into from
Oct 4, 2024

Conversation

PawelPeczek-Roboflow
Copy link
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow commented Oct 2, 2024

Description

PR adding profiling capabilities to Workflows Execution Engine and changes to improve speed:

  • caching of Workflow definition in memory / redis
  • caching list of blocks to make it faster to compile

Traces can be preview using chrome://tracing/

Also added changes to inference_sdk client and InferencePipeline to enable profiling logs gathering.

PS: there are still plenty of things to improve, compilation can still take 10-25ms which is a lot if we consider really small models that (on GPU) could run inference in comparable speed - basically workflows in such scenario slaughters throughput 😢 Hopefully, for larger models / more advanced workflows - the overhead in % of total execution is small, and! for video processing the compilation only happens once.

🎥 Inference Pipeline profiling - MacBook Pro For InferencePipeline , profiler accumulates information for consecutive processed frames:
  • Compilation overhead is negligible, only happens once at the beginning
  • Workflow I used for test is yolov8-640+bbox visualisation. For each frame ~95% of time takes model block which is in my empirical measurement slightly slower than standard model.predict() in `inference` - due to additional transformations of data into sv.detections and management of metadata
  • On average, one frame is processed in 40ms by EE, where ~38ms takes model block itself and we could run at ~34-35ms using standard model.predict() in `inference` - which means that FPS drops from ~29-30 to ~25-26
  • Observation: Workflows EE itself adds small nominal latency, but this 1-2ms added by engine itself matters when we count FPS

🐎 Speed improvements - caching As illustrated below, there were two obvious improvements for performance speed of Workflows EE regarding processing requests in inference server:
  • When using Workflow that is saved on Roboflow platform - we were always pooling the definition which, dependent on our Roboflow API load could take ~300ms - basically doubling the inference time from smaller models
  • Substantial amount of time - on my MacBook around 140ms - was consumed on assembling the `pydantic` Model that contains all dynamically loaded blocks - making it possible to parse the manifest.

Solutions

Caching Workflows definitions

We are not always pooling Workflow definition from API now - we use memory / Redis cache to save the definition for 15 minutes. We expose `use_cache` option for request payload to disable cache read / write.

Caching `pydantic` models for given set of blocks

Without use of dynamic blocks - we can expect change in pool of blocks only once the `inference` process is loaded into memory - hence for whole runtime of the server, set of blocks should be constant and we can only build entity for manifest parsing once. Added simple memory cache to keep the definitions.

With enterprise blocks we will change this state of affairs, but even then this simple cache can be quite performant - as we would have limited number of plugins and with the cache of quite small size we may be able to load all variations of entity into memory.

This simple caching will not work in general sense for dynamic blocs:

  • this is not a problem for hosted platform
  • for self-hosted deployment - that would be the problem and we can expect +100-140ms latency on each request when given node processes dynamic blocks (for video, not that relevant)

Results

BeforeAfter
~850ms~375ms
🍒 Cherry-picked example when Hosted API is faster than self-hosted This example is not intended to make the point that Hosted platform is faster than self-hosted - it only illustrates the scenario when workflow hugely benefits from parallel requests AND hosted platform workers are warm regarding required models AND device self-hosting the server is not powerful enough to run multiple models at the same time.

Results

Local serverHosted platform
~650ms~450ms
🏃 Benchmark on Tesla T4 - Workflows with model block vs inference server request @PacificDou report abbreviated as SR

Model family: yolov8n - different input sizes

Modelimg-sizeSR serverserverSR EEEEEE overhead [%]
football-player-detection-ej9zh/12 320 52.4 RPS 48 RPS 2 RPS 28 RPS +75%
football-player-detection-ej9zh/13 640 45 RPS 36.5 RPS 2 RPS 24 RPS +52%
football-player-detection-ej9zh/14 960 31.9 RPS 26 RPS 2 RPS 18.5 RPS +38.5%
football-player-detection-ej9zh/15 1280 24 RPS 19 RPS 1.8 RPS 13.5 RPS +39%

That was always the case that EE was adding ~+10-15ms latency ❗

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

  • CI still 🟢
  • new automated tests
  • tested and measured E2E in different scenarios

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:

inference/core/entities/requests/workflows.py Outdated Show resolved Hide resolved
inference/core/entities/requests/workflows.py Outdated Show resolved Hide resolved
inference/core/interfaces/stream/inference_pipeline.py Outdated Show resolved Hide resolved
inference/core/roboflow_api.py Outdated Show resolved Hide resolved
@PawelPeczek-Roboflow PawelPeczek-Roboflow merged commit e7071a0 into main Oct 4, 2024
57 checks passed
@PawelPeczek-Roboflow PawelPeczek-Roboflow deleted the feature/workflows_profiler branch October 4, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants