Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose transfer bytes total (count) rather than just gauge of current value #7123

Closed
Tracked by #7345
ntabris opened this issue Oct 6, 2022 · 6 comments · Fixed by #7388
Closed
Tracked by #7345

expose transfer bytes total (count) rather than just gauge of current value #7123

ntabris opened this issue Oct 6, 2022 · 6 comments · Fixed by #7388

Comments

@ntabris
Copy link
Contributor

ntabris commented Oct 6, 2022

Currently the worker Prometheus endpoint exposes transfer_incoming_bytes and transfer_outgoing_bytes as gauges—i.e., the current value at a single point in time.

A better way to expose this sort of data is as a monotonically increasing count metric type (this should be exposed as transfer_incoming_bytes_total and transfer_outgoing_bytes_total).

It's easy to get rate from an accumulated count, but you can't get accurate count from a sampled rate.

@mrocklin
Copy link
Member

mrocklin commented Oct 6, 2022 via email

@hendrikmakait
Copy link
Member

We expose transfer_incoming_bytes and transfer_outgoing_bytes as gauges as these point-in-time metrics inform data transfer throttling and should therefore be visible. If you find exposing transfer_{incoming|outgoing}_bytes_total useful, we can add those. We should just make sure we have a good definition for those metrics (see #6936 (comment)) for a more in-depth comment).

@fjetter
Copy link
Member

fjetter commented Nov 25, 2022

@ntabris is there anything left to do or are your concerns addressed?

@ntabris
Copy link
Contributor Author

ntabris commented Nov 25, 2022

I think we don't expose actually bytes transferred over time, i.e., what @crusaderky calls "cumulative" values in #6936 (comment)

That's what I was asking for. If it's not high-value, feel free to ignore for now though.

For context, host metrics can tell us how much data moves in/out of each worker. What it can't exactly tell us (at least not easily) is how much of that is transfer vs data moving into/out of cluster (e.g., S3). I think it would be nice if Dask could tell us how many bytes of host network traffic is for transfer.

@crusaderky
Copy link
Collaborator

See also #7357

@gjoseph92
Copy link
Collaborator

I would also find this useful for benchmarking. Total amount of data transferred is a useful metric to compare when working on changes to scheduling.

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022
gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022
gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants