Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paused time is slightly nondeterministic #3179

Closed
sfackler opened this issue Nov 26, 2020 · 7 comments · Fixed by #3492
Closed

Paused time is slightly nondeterministic #3179

sfackler opened this issue Nov 26, 2020 · 7 comments · Fixed by #3492
Assignees
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-time Module: tokio/time

Comments

@sfackler
Copy link
Contributor

Version

└── tokio v0.3.4
    └── tokio-macros v0.3.1

Platform
Linux DESKTOP-DHO88R7 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Description

The time control exposed under the test-util feature allows test code involving the advance of time to run without actually needing to wait around for wall-clock time to move. In theory it should also allow those tests to be completely deterministic, with time advancing exactly the same way each time the test runs (assuming no other sources of entropy in the test of course).

However, it appears that this is not actually the case right now. The first Sleep after time is paused will observe a slightly larger amount of time passing than requested, and that surplus is nondeterministic between runs:

use tokio::time::{self, Instant, Duration};

#[tokio::main]
async fn test() {
    time::pause();
    for _ in 0..5 {
        let start = Instant::now();
        time::sleep(Duration::from_secs(1) / 11).await;
        println!("woke after {:?}", start.elapsed());
    }
}

fn main() {
    test();
    println!("------");
    test();
}
woke after 91.0741ms
woke after 91ms
woke after 91ms
woke after 91ms
woke after 91ms
------
woke after 91.1666ms
woke after 91ms
woke after 91ms
woke after 91ms
woke after 91ms

I believe this is because the timer wheel has 1ms resolution, and when time pauses the instant is at some random point in the middle of a millisecond. That random interval ends up being added to the first sleep call.

Making the paused time infrastructure absolutely deterministic would be helpful for things like simulations of behavior of an async library in various scenarios. If the execution of the simulation is deterministic, you can know that any change must have happened due to whatever modification you were making to the library itself rather than being random noise.

@sfackler sfackler added C-bug Category: This is a bug. A-tokio Area: The main tokio crate labels Nov 26, 2020
@Darksonn Darksonn added the M-time Module: tokio/time label Nov 26, 2020
@sfackler
Copy link
Contributor Author

As a workaround, you can just stick a sleep immediately after pausing time to make everything after deterministic.

@kstrafe
Copy link

kstrafe commented Dec 7, 2020

@sfackler sleeping directly after the pause doesn't appear to help. Tried with 0 and 1 ms.

@carllerche carllerche added this to the v1.0 milestone Dec 8, 2020
@carllerche
Copy link
Member

I don't think fixing this requires a breaking change, so I will remove the 1.0 tag.

@carllerche carllerche removed this from the v1.0 milestone Dec 9, 2020
@sfackler
Copy link
Contributor Author

sfackler commented Jan 31, 2021

In 1.0, the new timer implementation is very slightly nondeterministic, and it doesn't seem like the 1ms sleep fixes things. If you run the program below twice, you often (but not always) can see the output diverge even though it should in theory be entirely deterministic.

use rand::SeedableRng;
use rand::{rngs::StdRng, Rng};
use tokio::runtime;
use tokio::time::{self, Duration, Instant};

fn main() {
    let runtime = runtime::Builder::new_current_thread()
        .enable_time()
        .build()
        .unwrap();

    runtime.block_on(async {
        time::pause();

        time::sleep(Duration::from_millis(1)).await;
        let mut rng = StdRng::seed_from_u64(1);

        let start = Instant::now();
        for _ in 0..1000 {
            let delay = rng.gen_range(Duration::from_secs(0)..Duration::from_secs(1));

            println!("{:?}: sleeping for {:?}", start.elapsed(), delay);
            time::sleep(delay).await;
            println!("{:?}: awoke", start.elapsed());
        }
    });
}

Here's diff -y output at the point at which they diverged for me locally:

...
49.92s: sleeping for 55.984549ms                                49.92s: sleeping for 55.984549ms
49.977s: awoke                                                  49.977s: awoke
49.977s: sleeping for 919.966044ms                              49.977s: sleeping for 919.966044ms
50.897s: awoke                                                | 50.898s: awoke
50.897s: sleeping for 165.606289ms                            | 50.898s: sleeping for 165.606289ms
51.063s: awoke                                                | 51.064s: awoke

The two runs eventually diverge when one wakes up 1ms later than the other, even though the sequence of sleep durations was identical in both cases.

@sfackler
Copy link
Contributor Author

I think the right approach here may be to force the test-util clock to operate at 1ms resolution, even when unpaused. That would avoid any time travel when pausing/unpausing but also ensure there's no "drift" of sub-millisecond values that accumulate in the wheel.

@carllerche
Copy link
Member

@bdonlan any thoughts?

@sfackler
Copy link
Contributor Author

sfackler commented Jan 31, 2021

I think the actual problem is that the time driver creates its ClockTime, which is used to convert Instants to integers immediately when the runtime is created, before it has a chance to pause time. This means that there is a nondeterministic delay between when that offset is created and time is paused. That small delay means that milliseconds will round just slightly differently when the Instants passed in are close enough to a millisecond boundary with respect to the delay.

As an example, we can shrink the example program above down to just this:

use tokio::runtime;
use tokio::time::{self, Duration};

fn main() {
    let runtime = runtime::Builder::new_current_thread()
        .enable_time()
        .build()
        .unwrap();

    runtime.block_on(async {
        time::pause();

        time::sleep(Duration::from_millis(1)).await;
    });
}

And adding println!("tick {:?}, {}", dur, ms) to ClockTime::instant_to_tick, we see this output:

$ cargo run
tick 2.031699ms, 2
tick 31.7µs, 0
parking from 0 to 2
tick 2.0317ms, 2
$ cargo run
tick 2.031199ms, 2
tick 31.2µs, 0
parking from 0 to 2
tick 2.0312ms, 2

Here you can see that while we're asking to sleep for just 1ms, which should be rounded up to 1.999999ms by deadline_to_tick, the calculcated duration is actually slightly larger, with the delta being 0.0001 different between the two runs. I think that extra time is the time between runtime creation and when time is paused.

sfackler added a commit to sfackler/tokio that referenced this issue Jan 31, 2021
The time driver stores an Instant internally used as a "base" for future
time calculations. Since this is generated as the Runtime is being
constructed, it previously always happened before the user had a chance
to pause time. The fractional-millisecond variations in the timing
around the runtime construction and time pause cause tests running
entirely in paused time to be very slightly deterministic, with the time
driver advancing time by 1 millisecond more or less depending on how the
sub-millisecond components of the `Instant`s involved compared.

To avoid this, there is now a new option on `runtime::Builder` which
will create a `Runtime` with time "instantly" paused. This, along with a
small change to have the time driver use the provided clock as the
source for its start time allow totally deterministic tests with paused
time.

Closes tokio-rs#3179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-time Module: tokio/time
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants