-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[KVConnector] Add metrics to Prometheus-Grafana dashboard #26811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @NickLucche
I wanted to make sure the per-connector metrics was going to work out. I didn't feel good about adding a bunch of NIXL-specific metrics into vllm/v1/metrics. So I worked it out at NickLucche/pull/4
Couple of small inline comments on buckets too 👍
vllm/v1/metrics/loggers.py
Outdated
| nixl_histogram_post_time, engine_indexes, model_name | ||
| ) | ||
| # uniform 2kb to 16gb range | ||
| buckets = [2**10 + i for i in range(1, 24, 2)] |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
vllm/v1/metrics/loggers.py
Outdated
| name="vllm:nixl_post_time_seconds", | ||
| documentation="Histogram of transfer post time for NIXL KV" | ||
| " Cache transfers.", | ||
| buckets=buckets[1:], |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
[NIXL][Metrics] Add abstraction for per-connector Prometheus metrics
It's post times that need the smaller bucket size, not transfer duration. Uniform 2kb to 16gb range: ``` >>> def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']): ... """ Returns a human readable string representation of bytes """ ... return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:]) ... >>> [human_size(2**(10+i)) for i in range(1, 25, 2)] ['2KB', '8KB', '32KB', '128KB', '512KB', '2MB', '8MB', '32MB', '128MB', '512MB', '2GB', '8GB'] ``` Signed-off-by: Mark McLoughlin <markmc@redhat.com>
[NIXL][Metrics] Fix NIXL buckets
…ctor Signed-off-by: Mark McLoughlin <markmc@redhat.com>
[KV Connector][Metrics] Add prometheus metrics support to multi-connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markmc reviewed
…ct#26811) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Follow-up to the work and discussion in this PR #22188.
Here we expose the metrics we're currently tracking for the nixl connector to prometheus.
cc @markmc
Update:
@markmc generalized the PR to work for KVConnectorStats more broadly