-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backtrack execution for missing digests to make eager_fetch=false
more resilient
#15850
Backtrack execution for missing digests to make eager_fetch=false
more resilient
#15850
Conversation
5efed6c
to
cb36f53
Compare
cb36f53
to
f4b77a8
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/// | ||
pub fn maybe_start_backtracking(&self, node: &ExecuteProcess) -> usize { | ||
let mut backtrack_attempts = self.backtrack_attempts.lock(); | ||
let entry: Option<&mut usize> = backtrack_attempts.get_mut(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust could not infer type ascription?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defensive, and partially superstition: if the field is dereferenced too early, the increment can be lost (because you are incrementing a local variable rather than the value behind the pointer). The compiler still won't catch that IIRC.
f4b77a8
to
e53a231
Compare
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…an integration test. [ci skip-build-wheels]
[ci skip-build-wheels]
[ci skip-build-wheels]
e53a231
to
f9a4692
Compare
Rather than #15856, I've added one more commit here that merges the |
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! The implementation is a lot simpler than I was expecting!!
As described in #11331, in order to avoid having to deal with missing remote content later in the pipeline,
--remote-cache-eager-fetch
currently defaults to true. This means that before calling a cache hit a hit, we fully download the output of the cache entry.In warm-cache situations, this can mean downloading a lot more than is strictly necessary. In theory, you could imagine
eager_fetch=False
downloading only stdio and no file content at all for a 100% cache hit rate run of tests. In practice, high hitrate runs see about 80% fewer bytes downloaded, and 50% fewer RPCs than witheager_fetch=True
.To begin moving toward disabling
eager_fetch
by default (and eventually, ideally, removing the flag entirely), this change begins "backtracking" when missing digests are encountered. Backtracking is implemented by "catching"MissingDigest
errors (introduced in #15761), and invalidating their sourceNode
in the graph. When aNode
that produced a missing digest re-runs, it does so using progressively fewer caches (as introduced in #15854), in order to cache bust both local and remote partial cache entries.eager_fetch=False
was already experimental, in that anyMissingDigest
error encountered later in the run would kill the entire run. Backtracking makeseager_fetch=False
less experimental, in that we are now very likely to recover from aMissingDigest
error. But it is still the case witheager_fetch=False
that persistent remote infrastructure errors (those that last longer than our retry budget or timeout) could kill a run. Given that, we will likely want to gain more experience and further tune timeouts and retries before changing the default.Fixes #11331.