-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose transfer bytes total (count) rather than just gauge of current value #7123
Comments
I'm in favor of this generally
…On Thu, Oct 6, 2022, 9:36 AM Nat Tabris ***@***.***> wrote:
Currently the worker Prometheus endpoint exposes transfer_incoming_bytes
and transfer_outgoing_bytes as gauges—i.e., the current value at a single
point in time.
A better way to expose this sort of data is as a monotonically increasing
count metric type (this should be exposed as transfer_outgoing_bytes_total
and transfer_outgoing_bytes_total).
It's easy to get rate from an accumulated count, but you can't get
accurate count from a sampled rate.
—
Reply to this email directly, view it on GitHub
<#7123>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGM2ZFQI42QDXJL2F3WB35XXANCNFSM6AAAAAAQ6ZYKTI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
We expose |
@ntabris is there anything left to do or are your concerns addressed? |
I think we don't expose actually bytes transferred over time, i.e., what @crusaderky calls "cumulative" values in #6936 (comment) That's what I was asking for. If it's not high-value, feel free to ignore for now though. For context, host metrics can tell us how much data moves in/out of each worker. What it can't exactly tell us (at least not easily) is how much of that is transfer vs data moving into/out of cluster (e.g., S3). I think it would be nice if Dask could tell us how many bytes of host network traffic is for transfer. |
See also #7357 |
I would also find this useful for benchmarking. Total amount of data transferred is a useful metric to compare when working on changes to scheduling. |
Currently the worker Prometheus endpoint exposes
transfer_incoming_bytes
andtransfer_outgoing_bytes
as gauges—i.e., the current value at a single point in time.A better way to expose this sort of data is as a monotonically increasing count metric type (this should be exposed as
transfer_incoming_bytes_total
andtransfer_outgoing_bytes_total
).It's easy to get rate from an accumulated count, but you can't get accurate count from a sampled rate.
The text was updated successfully, but these errors were encountered: