-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to run-graph #234
Conversation
8491c47
to
0f6fa6e
Compare
4e10c30
to
4e69008
Compare
e457ab5
to
b396573
Compare
b396573
to
f8e90f0
Compare
Ping. What is this waiting on? |
This is waiting on @aidanhs if he want to take a look. |
src/run_graph.rs
Outdated
|
||
pub enum Node { | ||
Task(Task), | ||
RunningTask, | ||
RunningTask(Option<Arc<Task>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a comment on this as to the meaning of the Option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the Option
away.
src/run_graph.rs
Outdated
if let Node::Task(task) = content { | ||
let task = Arc::new(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking this looks like an odd thing to me -- should we be constructing the Arc somewhere more at the initial place tasks get created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
f8e90f0
to
7edcf71
Compare
This has the nice side-effect of skipping the prepare step of crates already fully tested.
02e058a
to
b7ef296
Compare
config.toml
Outdated
# - skip-tests (bool): don't run tests in this crate/repo | ||
# - quiet (bool): don't kill after two minutes without output | ||
# - update-lockfile (bool): update the lockfile even if the crate has one | ||
# - broken (bool): skip the crate only if there is an crater error during a stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rephrase this a bit? Maybe "Treat a crater error on this crate as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible')" - you might need to word wrap.
Or something shorter if it seems appropriate. It was just initially confusing to puzzle out what we're trying to achieve by transforming an 'Error' into a 'BuildFail' until I remembered your (reasonable) position on making unexpected errors loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
Treat a Crater error on this crate/repo as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible', while still building it if the failure is resolved in the future)
capture_lockfile(ex, c, path, toolchain) | ||
}).chain_err(|| format!("failed to generate lockfile for {}", c)); | ||
if let Err(e) = r { | ||
if let Err(e) = capture_lockfile(config, ex, c, toolchain) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice cleanup 👍
src/run_graph.rs
Outdated
let mut dependencies = 0; | ||
|
||
// Try to check for the dependencies of this node | ||
// The list is collected to make the borrowchecker happy | ||
let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>(); | ||
for neighbor in neighbors.drain(..) { | ||
match self.walk_graph(neighbor) { | ||
match self.walk_graph(neighbor, ex, db) { | ||
WalkResult::Task(id, task) => return WalkResult::Task(id, task), | ||
WalkResult::Blocked => dependencies += 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just early exit with ::Blocked
here I think? And then below we can just assume that there are no dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually no, if we early return there only one task is executed at the time, because the graph walker has no chance to look for tasks in the other dependent nodes. Fixed in #265
.mark_as_failed(id, ex, db, &e, result)?; | ||
} else { | ||
graph.lock().unwrap().mark_as_completed(id); | ||
} | ||
} else { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment here to explicitly state that "Threads terminate as soon as there's no available work, which is subideal in general but fine for crater today because the dependency chains are linear, so the completion of one task will never permit more than one more to be executed."
(assuming I'm remembering correctly that the dependency chains are linear)
src/tasks.rs
Outdated
@@ -58,18 +59,63 @@ impl fmt::Debug for Task { | |||
} | |||
|
|||
impl Task { | |||
pub fn run<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> Result<()> { | |||
pub fn needs_exec<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> bool { | |||
// A prepare step should already be executed, and other steps only if were not executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should 'already' say 'always' here? Otherwise I'm a bit confused about how to reconcile the comment with code that seems to suggest a prepare step is always ready to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the prepare step is always executed, but in practice it won't be executed if all the dependent tasks will not be executed, since the runner won't reach the prepare node.
Reworded the comment to better explain that.
src/tasks.rs
Outdated
| TaskStep::BuildOnly { ref tc, .. } | ||
| TaskStep::CheckOnly { ref tc, .. } | ||
| TaskStep::UnstableFeatures { ref tc } => db | ||
.already_executed(ex, tc, &self.krate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this method something more obvious like get_result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/run_graph.rs
Outdated
{ | ||
if !task.needs_exec(ex, db) { | ||
return WalkResult::NotBlocked; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So !needs_exec
will only return true if a crate result is already completed according to the database, but for some reason the graph doesn't reflect it (perhaps when resuming an experiment?). In this case, shouldn't we take the opportunity to remove the node from the graph so we don't keep traversing over it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be a good idea, implemented that.
let mut dependencies = 0; | ||
|
||
// Try to check for the dependencies of this node | ||
// The list is collected to make the borrowchecker happy | ||
let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>(); | ||
for neighbor in neighbors.drain(..) { | ||
match self.walk_graph(neighbor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment here to say something like "Should recurse a maximum of one time, since completed nodes should have been removed from the graph".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/run_graph.rs
Outdated
let root = self.root; | ||
if let WalkResult::Task(id, task) = self.walk_graph(root) { | ||
if let WalkResult::Task(id, task) = self.walk_graph(root, ex, db) { | ||
Some((id, task)) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that (today) the only two permissible variants here are Task
and NotBlocked
- Blocked
indicates that graph dependencies have become nonlinear (possible at some point in the future) and something needs to change at the higher level to ensure we don't lose multithreading due to thread early exit.
What if we explicitly matched Blocked
here to panic and say "If this panics, you've changed dependencies to become nonlinear and you need to change the top level threading behaviour to make sure threads don't exit early if they find themselves blocked". Just don't want to be tracking down strange slowdowns down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocked
already happens today (for example when a prepare task is running, all the dependent tasks will be marked as blocked), so panicking on it doesn't seem like a good idea :P
Addressed all of the comments except threads exiting early. I'll open another PR for that. Merging this as soon as Travis goes green. |
Finally 🎉 🎉 🎉 |
There are still more improvements I want to make, but we can merge this PR anyway (I can always do them in another PR).update-lockfile
config option for crates, which forces the lockfile update (doh). This is needed because some GitHub repos have outdated lockfiles in them.path = "foo"
from target-specific dependencies inCargo.toml
. The old frobber only removed them from global dependencies.broken
config option for crates. This option is like a "soft-skip": if there is a crater error during the build (like a failure to extract the source) the build is skipped, otherwise it continues as normal. This allows us to add the option without worrying about removing it in the future if the crate is fixed and there is no need to skip it anymore.Fixes #222
Fixes #129
Fixes #89
The new errors section