Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtrack execution for missing digests to make eager_fetch=false more resilient #15850

Merged

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Jun 16, 2022

As described in #11331, in order to avoid having to deal with missing remote content later in the pipeline, --remote-cache-eager-fetch currently defaults to true. This means that before calling a cache hit a hit, we fully download the output of the cache entry.

In warm-cache situations, this can mean downloading a lot more than is strictly necessary. In theory, you could imagine eager_fetch=False downloading only stdio and no file content at all for a 100% cache hit rate run of tests. In practice, high hitrate runs see about 80% fewer bytes downloaded, and 50% fewer RPCs than with eager_fetch=True.

To begin moving toward disabling eager_fetch by default (and eventually, ideally, removing the flag entirely), this change begins "backtracking" when missing digests are encountered. Backtracking is implemented by "catching" MissingDigest errors (introduced in #15761), and invalidating their source Node in the graph. When a Node that produced a missing digest re-runs, it does so using progressively fewer caches (as introduced in #15854), in order to cache bust both local and remote partial cache entries.

eager_fetch=False was already experimental, in that any MissingDigest error encountered later in the run would kill the entire run. Backtracking makes eager_fetch=False less experimental, in that we are now very likely to recover from a MissingDigest error. But it is still the case with eager_fetch=False that persistent remote infrastructure errors (those that last longer than our retry budget or timeout) could kill a run. Given that, we will likely want to gain more experience and further tune timeouts and retries before changing the default.

Fixes #11331.

@stuhood

This comment was marked as outdated.

Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/rust/engine/src/context.rs Show resolved Hide resolved
src/rust/engine/src/context.rs Show resolved Hide resolved
///
pub fn maybe_start_backtracking(&self, node: &ExecuteProcess) -> usize {
let mut backtrack_attempts = self.backtrack_attempts.lock();
let entry: Option<&mut usize> = backtrack_attempts.get_mut(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust could not infer type ascription?

Copy link
Member Author

@stuhood stuhood Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defensive, and partially superstition: if the field is dereferenced too early, the increment can be lost (because you are incrementing a local variable rather than the value behind the pointer). The compiler still won't catch that IIRC.

@stuhood stuhood force-pushed the stuhood/backtrack-for-missing-digests branch from f4b77a8 to e53a231 Compare June 17, 2022 19:49
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-build-wheels]
@stuhood stuhood force-pushed the stuhood/backtrack-for-missing-digests branch from e53a231 to f9a4692 Compare June 17, 2022 19:49
@stuhood
Copy link
Member Author

stuhood commented Jun 17, 2022

Rather than #15856, I've added one more commit here that merges the StubCAS and StubActionCache.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood stuhood enabled auto-merge (squash) June 17, 2022 20:38
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The implementation is a lot simpler than I was expecting!!

@stuhood stuhood merged commit 7dd8605 into pantsbuild:main Jun 17, 2022
@stuhood stuhood deleted the stuhood/backtrack-for-missing-digests branch June 17, 2022 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backtracking for missing Digests (aka: remove "eager fetch")
3 participants