Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stats when unparking worker in multi-threaded runtime #6131

Merged
merged 1 commit into from
Nov 6, 2023
Merged

Update stats when unparking worker in multi-threaded runtime #6131

merged 1 commit into from
Nov 6, 2023

Conversation

aliu
Copy link
Contributor

@aliu aliu commented Nov 6, 2023

Fixes #6129

I am not an expert on the runtime implementation, but looking at the code it seems like we should reset the start time for processing scheduled tasks when the worker thread unparks. Otherwise it only happens occasionally when calling Context::maintenance, which is why the worker busy duration metrics were getting so large.

Using the following program:

use std::time::Duration;

use tokio::runtime::{Handle, RuntimeMetrics};

fn main() {
    tokio::runtime::Builder::new_multi_thread()
        .enable_all()
        .worker_threads(3)
        .build()
        .unwrap()
        .block_on(async {
            let metrics = Handle::current().metrics();
            std::thread::spawn(|| track_event_load(metrics));

            tokio::spawn(async move {
                loop {
                    tokio::time::sleep(Duration::from_millis(100)).await;
                }
            });

            tokio::signal::ctrl_c().await.unwrap();
        });
}

fn track_event_load(metrics: RuntimeMetrics) {
    let n = metrics.num_workers();
    let mut time = vec![Duration::ZERO; n];
    loop {
        for i in 0..n {
            let busy = metrics.worker_total_busy_duration(i);
            println!(
                "worker {i}: busy {:?}, mean poll time {:?}",
                (busy - time[i]),
                metrics.worker_mean_poll_time(i),
            );
            time[i] = busy;
        }
        std::thread::sleep(Duration::from_millis(100));
    }
}
Before
❯ cargo run --release --example event_load
worker 0: busy 0ns, mean poll time 0ns
worker 1: busy 0ns, mean poll time 0ns
worker 2: busy 128ns, mean poll time 3.278µs
worker 0: busy 514ns, mean poll time 3.278µs
worker 1: busy 147ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 101.290505ms, mean poll time 10.132001ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 202.437666ms, mean poll time 27.438499ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 303.53971ms, mean poll time 47.422419ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 404.671251ms, mean poll time 65.90546ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 505.784504ms, mean poll time 80.341276ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 606.920563ms, mean poll time 90.092997ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 707.052493ms, mean poll time 95.787126ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 808.174934ms, mean poll time 98.768482ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 909.323198ms, mean poll time 100.157462ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.010438555s, mean poll time 100.734789ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.111573641s, mean poll time 100.952558ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.21268875s, mean poll time 101.027786ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.313819225s, mean poll time 101.054061ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.414937339s, mean poll time 101.064004ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.516044505s, mean poll time 101.068474ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.617197501s, mean poll time 101.073663ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.718327762s, mean poll time 101.077363ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.819455308s, mean poll time 101.080327ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 1.920529669s, mean poll time 101.080483ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.021637789s, mean poll time 101.081718ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.122781536s, mean poll time 101.084493ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.223901987s, mean poll time 101.08626ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.324050778s, mean poll time 101.049281ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.425168278s, mean poll time 101.048726ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.526303472s, mean poll time 101.051893ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.627466303s, mean poll time 101.056105ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.728605373s, mean poll time 101.059262ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.829755229s, mean poll time 101.062507ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 2.930890387s, mean poll time 101.065059ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 3.031026493s, mean poll time 101.03552ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 101.123154ms, mean poll time 101.044286ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 202.265971ms, mean poll time 101.061142ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 303.370246ms, mean poll time 101.078021ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 404.504463ms, mean poll time 101.094562ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 505.637196ms, mean poll time 101.108026ms
worker 1: busy 0ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
After
❯ cargo run --release --example event_load
worker 0: busy 0ns, mean poll time 0ns
worker 1: busy 0ns, mean poll time 0ns
worker 2: busy 119ns, mean poll time 3.278µs
worker 0: busy 193ns, mean poll time 3.278µs
worker 1: busy 377ns, mean poll time 3.278µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 11.939µs, mean poll time 4.145µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.814µs, mean poll time 4.413µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 5.868µs, mean poll time 4.559µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 4.537µs, mean poll time 4.558µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 4.968µs, mean poll time 4.6µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.824µs, mean poll time 4.823µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.622µs, mean poll time 5.005µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 5.84µs, mean poll time 5.09µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.135µs, mean poll time 5.196µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.392µs, mean poll time 5.318µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.732µs, mean poll time 5.461µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 5.216µs, mean poll time 5.438µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 10.148µs, mean poll time 5.912µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 6.484µs, mean poll time 5.971µs
worker 2: busy 0ns, mean poll time 3.278µs
worker 0: busy 0ns, mean poll time 3.278µs
worker 1: busy 5.722µs, mean poll time 5.949µs
worker 2: busy 0ns, mean poll time 3.278µs

@github-actions github-actions bot added the R-loom-multi-thread Run loom multi-thread tests on this PR label Nov 6, 2023
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Nov 6, 2023
Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks correct to me.

@Darksonn
Copy link
Contributor

Darksonn commented Nov 6, 2023

Please verify that this actually fixes the linked issue.

@aliu
Copy link
Contributor Author

aliu commented Nov 6, 2023

Please verify that this actually fixes the linked issue.

updated description

@Darksonn Darksonn merged commit 8ec3e0d into tokio-rs:master Nov 6, 2023
79 checks passed
@Darksonn
Copy link
Contributor

Darksonn commented Nov 6, 2023

Thank you.

@aliu aliu deleted the runtime-metrics branch November 6, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics R-loom-multi-thread Run loom multi-thread tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

worker_total_busy_duration seems to be calculated incorrectly when using multi-threaded runtime
2 participants