expose transfer bytes total (count) rather than just gauge of current value #7123

ntabris · 2022-10-06T16:36:00Z

Currently the worker Prometheus endpoint exposes transfer_incoming_bytes and transfer_outgoing_bytes as gauges—i.e., the current value at a single point in time.

A better way to expose this sort of data is as a monotonically increasing count metric type (this should be exposed as transfer_incoming_bytes_total and transfer_outgoing_bytes_total).

It's easy to get rate from an accumulated count, but you can't get accurate count from a sampled rate.

The text was updated successfully, but these errors were encountered:

mrocklin · 2022-10-06T17:27:02Z

I'm in favor of this generally

…

On Thu, Oct 6, 2022, 9:36 AM Nat Tabris ***@***.***> wrote: Currently the worker Prometheus endpoint exposes transfer_incoming_bytes and transfer_outgoing_bytes as gauges—i.e., the current value at a single point in time. A better way to expose this sort of data is as a monotonically increasing count metric type (this should be exposed as transfer_outgoing_bytes_total and transfer_outgoing_bytes_total). It's easy to get rate from an accumulated count, but you can't get accurate count from a sampled rate. — Reply to this email directly, view it on GitHub <#7123>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTGM2ZFQI42QDXJL2F3WB35XXANCNFSM6AAAAAAQ6ZYKTI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

hendrikmakait · 2022-10-17T10:05:12Z

We expose transfer_incoming_bytes and transfer_outgoing_bytes as gauges as these point-in-time metrics inform data transfer throttling and should therefore be visible. If you find exposing transfer_{incoming|outgoing}_bytes_total useful, we can add those. We should just make sure we have a good definition for those metrics (see #6936 (comment)) for a more in-depth comment).

fjetter · 2022-11-25T15:34:18Z

@ntabris is there anything left to do or are your concerns addressed?

ntabris · 2022-11-25T16:46:53Z

I think we don't expose actually bytes transferred over time, i.e., what @crusaderky calls "cumulative" values in #6936 (comment)

That's what I was asking for. If it's not high-value, feel free to ignore for now though.

For context, host metrics can tell us how much data moves in/out of each worker. What it can't exactly tell us (at least not easily) is how much of that is transfer vs data moving into/out of cluster (e.g., S3). I think it would be nice if Dask could tell us how many bytes of host network traffic is for transfer.

crusaderky · 2022-11-29T11:24:16Z

See also #7357

gjoseph92 · 2022-12-09T16:58:23Z

I would also find this useful for benchmarking. Total amount of data transferred is a useful metric to compare when working on changes to scheduling.

Closes dask#7123

fjetter mentioned this issue Nov 25, 2022

Prometheus metrics improvements #7345

Open

9 tasks

fjetter added the diagnostics label Nov 25, 2022

crusaderky mentioned this issue Nov 29, 2022

Prometheus: expose cumulative worker utilisation instead of spot metric #7357

Open

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022

Add transfer_outgoing_bytes_total metric

15e6663

Closes dask#7123

gjoseph92 mentioned this issue Dec 9, 2022

Add transfer_outgoing_bytes_total metric #7388

Merged

2 tasks

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022

Add transfer_outgoing_bytes_total metric

e52dce5

Closes dask#7123

gjoseph92 closed this as completed in #7388 Dec 9, 2022

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Dec 9, 2022

Add transfer_outgoing_bytes_total metric

928f91f

Closes dask#7123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose transfer bytes total (count) rather than just gauge of current value #7123

expose transfer bytes total (count) rather than just gauge of current value #7123

ntabris commented Oct 6, 2022 •

edited

Loading

mrocklin commented Oct 6, 2022 via email

hendrikmakait commented Oct 17, 2022

fjetter commented Nov 25, 2022

ntabris commented Nov 25, 2022

crusaderky commented Nov 29, 2022

gjoseph92 commented Dec 9, 2022

expose transfer bytes total (count) rather than just gauge of current value #7123

expose transfer bytes total (count) rather than just gauge of current value #7123

Comments

ntabris commented Oct 6, 2022 • edited Loading

mrocklin commented Oct 6, 2022 via email

hendrikmakait commented Oct 17, 2022

fjetter commented Nov 25, 2022

ntabris commented Nov 25, 2022

crusaderky commented Nov 29, 2022

gjoseph92 commented Dec 9, 2022

ntabris commented Oct 6, 2022 •

edited

Loading