-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paused time is slightly nondeterministic #3179
Comments
As a workaround, you can just stick a sleep immediately after pausing time to make everything after deterministic. |
@sfackler sleeping directly after the pause doesn't appear to help. Tried with 0 and 1 ms. |
I don't think fixing this requires a breaking change, so I will remove the 1.0 tag. |
In 1.0, the new timer implementation is very slightly nondeterministic, and it doesn't seem like the 1ms sleep fixes things. If you run the program below twice, you often (but not always) can see the output diverge even though it should in theory be entirely deterministic. use rand::SeedableRng;
use rand::{rngs::StdRng, Rng};
use tokio::runtime;
use tokio::time::{self, Duration, Instant};
fn main() {
let runtime = runtime::Builder::new_current_thread()
.enable_time()
.build()
.unwrap();
runtime.block_on(async {
time::pause();
time::sleep(Duration::from_millis(1)).await;
let mut rng = StdRng::seed_from_u64(1);
let start = Instant::now();
for _ in 0..1000 {
let delay = rng.gen_range(Duration::from_secs(0)..Duration::from_secs(1));
println!("{:?}: sleeping for {:?}", start.elapsed(), delay);
time::sleep(delay).await;
println!("{:?}: awoke", start.elapsed());
}
});
} Here's
The two runs eventually diverge when one wakes up 1ms later than the other, even though the sequence of sleep durations was identical in both cases. |
I think the right approach here may be to force the test-util clock to operate at 1ms resolution, even when unpaused. That would avoid any time travel when pausing/unpausing but also ensure there's no "drift" of sub-millisecond values that accumulate in the wheel. |
@bdonlan any thoughts? |
I think the actual problem is that the time driver creates its As an example, we can shrink the example program above down to just this:
And adding
Here you can see that while we're asking to sleep for just 1ms, which should be rounded up to 1.999999ms by deadline_to_tick, the calculcated duration is actually slightly larger, with the delta being 0.0001 different between the two runs. I think that extra time is the time between runtime creation and when time is paused. |
The time driver stores an Instant internally used as a "base" for future time calculations. Since this is generated as the Runtime is being constructed, it previously always happened before the user had a chance to pause time. The fractional-millisecond variations in the timing around the runtime construction and time pause cause tests running entirely in paused time to be very slightly deterministic, with the time driver advancing time by 1 millisecond more or less depending on how the sub-millisecond components of the `Instant`s involved compared. To avoid this, there is now a new option on `runtime::Builder` which will create a `Runtime` with time "instantly" paused. This, along with a small change to have the time driver use the provided clock as the source for its start time allow totally deterministic tests with paused time. Closes tokio-rs#3179
Version
Platform
Linux DESKTOP-DHO88R7 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Description
The time control exposed under the
test-util
feature allows test code involving the advance of time to run without actually needing to wait around for wall-clock time to move. In theory it should also allow those tests to be completely deterministic, with time advancing exactly the same way each time the test runs (assuming no other sources of entropy in the test of course).However, it appears that this is not actually the case right now. The first
Sleep
after time is paused will observe a slightly larger amount of time passing than requested, and that surplus is nondeterministic between runs:I believe this is because the timer wheel has 1ms resolution, and when time pauses the instant is at some random point in the middle of a millisecond. That random interval ends up being added to the first sleep call.
Making the paused time infrastructure absolutely deterministic would be helpful for things like simulations of behavior of an async library in various scenarios. If the execution of the simulation is deterministic, you can know that any change must have happened due to whatever modification you were making to the library itself rather than being random noise.
The text was updated successfully, but these errors were encountered: