Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to run-graph #234

Merged
merged 18 commits into from
Jul 9, 2018

Conversation

pietroalbini
Copy link
Member

@pietroalbini pietroalbini commented May 18, 2018

There are still more improvements I want to make, but we can merge this PR anyway (I can always do them in another PR).

  • Switched to a stable graph instead of using a bad hack to replicate it (didn't know it existed when I implemented run-graph the first time)
  • Changed the graph walker to ignore nodes already executed. This makes resuming a failed run instantaneous (since both the tasks already have a result they're skipped, avoiding the prepare step)
  • Avoided crashing if an error occurs, and record the failure in a new "errors" section of the report, with the error message in the log. This allows us to quickly spot errors without looking at the crater log
  • Properly returned errors (only in run-graph) in the prepare task
  • Used different target directories for different threads. This properly parallelizes the build (while consuming more disk space and having less cache)
  • Ignored missing shas in the report generation. Before, shas were also collected for skipped repos, but that's not the case anymore. Now repos with missing shas will have the name/url without them.
  • Added a new update-lockfile config option for crates, which forces the lockfile update (doh). This is needed because some GitHub repos have outdated lockfiles in them.
  • Removed path = "foo" from target-specific dependencies in Cargo.toml. The old frobber only removed them from global dependencies.
  • Added a new broken config option for crates. This option is like a "soft-skip": if there is a crater error during the build (like a failure to extract the source) the build is skipped, otherwise it continues as normal. This allows us to add the option without worrying about removing it in the future if the crate is fixed and there is no need to skip it anymore.
  • Added a bunch of broken crates to the config file

Fixes #222
Fixes #129
Fixes #89

The new errors section

screenshot from 2018-05-18 16-48-55

@Eh2406
Copy link
Contributor

Eh2406 commented Jun 28, 2018

Ping. What is this waiting on?

@pietroalbini
Copy link
Member Author

This is waiting on @aidanhs if he want to take a look.

src/run_graph.rs Outdated

pub enum Node {
Task(Task),
RunningTask,
RunningTask(Option<Arc<Task>>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a comment on this as to the meaning of the Option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the Option away.

src/run_graph.rs Outdated
if let Node::Task(task) = content {
let task = Arc::new(task);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking this looks like an odd thing to me -- should we be constructing the Arc somewhere more at the initial place tasks get created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

config.toml Outdated
# - skip-tests (bool): don't run tests in this crate/repo
# - quiet (bool): don't kill after two minutes without output
# - update-lockfile (bool): update the lockfile even if the crate has one
# - broken (bool): skip the crate only if there is an crater error during a stage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rephrase this a bit? Maybe "Treat a crater error on this crate as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible')" - you might need to word wrap.

Or something shorter if it seems appropriate. It was just initially confusing to puzzle out what we're trying to achieve by transforming an 'Error' into a 'BuildFail' until I remembered your (reasonable) position on making unexpected errors loud.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

Treat a Crater error on this crate/repo as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible', while still building it if the failure is resolved in the future)

capture_lockfile(ex, c, path, toolchain)
}).chain_err(|| format!("failed to generate lockfile for {}", c));
if let Err(e) = r {
if let Err(e) = capture_lockfile(config, ex, c, toolchain) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice cleanup 👍

src/run_graph.rs Outdated
let mut dependencies = 0;

// Try to check for the dependencies of this node
// The list is collected to make the borrowchecker happy
let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>();
for neighbor in neighbors.drain(..) {
match self.walk_graph(neighbor) {
match self.walk_graph(neighbor, ex, db) {
WalkResult::Task(id, task) => return WalkResult::Task(id, task),
WalkResult::Blocked => dependencies += 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just early exit with ::Blocked here I think? And then below we can just assume that there are no dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually no, if we early return there only one task is executed at the time, because the graph walker has no chance to look for tasks in the other dependent nodes. Fixed in #265

.mark_as_failed(id, ex, db, &e, result)?;
} else {
graph.lock().unwrap().mark_as_completed(id);
}
} else {
break;
Copy link
Member

@aidanhs aidanhs Jul 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment here to explicitly state that "Threads terminate as soon as there's no available work, which is subideal in general but fine for crater today because the dependency chains are linear, so the completion of one task will never permit more than one more to be executed."

(assuming I'm remembering correctly that the dependency chains are linear)

src/tasks.rs Outdated
@@ -58,18 +59,63 @@ impl fmt::Debug for Task {
}

impl Task {
pub fn run<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> Result<()> {
pub fn needs_exec<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> bool {
// A prepare step should already be executed, and other steps only if were not executed
Copy link
Member

@aidanhs aidanhs Jul 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 'already' say 'always' here? Otherwise I'm a bit confused about how to reconcile the comment with code that seems to suggest a prepare step is always ready to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the prepare step is always executed, but in practice it won't be executed if all the dependent tasks will not be executed, since the runner won't reach the prepare node.

Reworded the comment to better explain that.

src/tasks.rs Outdated
| TaskStep::BuildOnly { ref tc, .. }
| TaskStep::CheckOnly { ref tc, .. }
| TaskStep::UnstableFeatures { ref tc } => db
.already_executed(ex, tc, &self.krate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this method something more obvious like get_result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/run_graph.rs Outdated
{
if !task.needs_exec(ex, db) {
return WalkResult::NotBlocked;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So !needs_exec will only return true if a crate result is already completed according to the database, but for some reason the graph doesn't reflect it (perhaps when resuming an experiment?). In this case, shouldn't we take the opportunity to remove the node from the graph so we don't keep traversing over it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be a good idea, implemented that.

let mut dependencies = 0;

// Try to check for the dependencies of this node
// The list is collected to make the borrowchecker happy
let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>();
for neighbor in neighbors.drain(..) {
match self.walk_graph(neighbor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment here to say something like "Should recurse a maximum of one time, since completed nodes should have been removed from the graph".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/run_graph.rs Outdated
let root = self.root;
if let WalkResult::Task(id, task) = self.walk_graph(root) {
if let WalkResult::Task(id, task) = self.walk_graph(root, ex, db) {
Some((id, task))
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that (today) the only two permissible variants here are Task and NotBlocked - Blocked indicates that graph dependencies have become nonlinear (possible at some point in the future) and something needs to change at the higher level to ensure we don't lose multithreading due to thread early exit.

What if we explicitly matched Blocked here to panic and say "If this panics, you've changed dependencies to become nonlinear and you need to change the top level threading behaviour to make sure threads don't exit early if they find themselves blocked". Just don't want to be tracking down strange slowdowns down the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked already happens today (for example when a prepare task is running, all the dependent tasks will be marked as blocked), so panicking on it doesn't seem like a good idea :P

@pietroalbini
Copy link
Member Author

Addressed all of the comments except threads exiting early. I'll open another PR for that. Merging this as soon as Travis goes green.

@pietroalbini
Copy link
Member Author

Finally 🎉 🎉 🎉

@pietroalbini pietroalbini merged commit 83d2693 into rust-lang:master Jul 9, 2018
@pietroalbini pietroalbini deleted the better-run-graph branch July 9, 2018 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants