Improvements to run-graph #234

pietroalbini · 2018-05-18T09:25:44Z

~~There are still more improvements I want to make, but we can merge this PR anyway (I can always do them in another PR).~~

Switched to a stable graph instead of using a bad hack to replicate it (didn't know it existed when I implemented run-graph the first time)
Changed the graph walker to ignore nodes already executed. This makes resuming a failed run instantaneous (since both the tasks already have a result they're skipped, avoiding the prepare step)
Avoided crashing if an error occurs, and record the failure in a new "errors" section of the report, with the error message in the log. This allows us to quickly spot errors without looking at the crater log
Properly returned errors (only in run-graph) in the prepare task
Used different target directories for different threads. This properly parallelizes the build (while consuming more disk space and having less cache)
Ignored missing shas in the report generation. Before, shas were also collected for skipped repos, but that's not the case anymore. Now repos with missing shas will have the name/url without them.
Added a new update-lockfile config option for crates, which forces the lockfile update (doh). This is needed because some GitHub repos have outdated lockfiles in them.
Removed path = "foo" from target-specific dependencies in Cargo.toml. The old frobber only removed them from global dependencies.
Added a new broken config option for crates. This option is like a "soft-skip": if there is a crater error during the build (like a failure to extract the source) the build is skipped, otherwise it continues as normal. This allows us to add the option without worrying about removing it in the future if the crate is fixed and there is no need to skip it anymore.
Added a bunch of broken crates to the config file

Fixes #222
Fixes #129
Fixes #89

The new errors section

Eh2406 · 2018-06-28T21:25:31Z

Ping. What is this waiting on?

pietroalbini · 2018-06-28T21:39:36Z

This is waiting on @aidanhs if he want to take a look.

Mark-Simulacrum · 2018-07-03T22:36:42Z

src/run_graph.rs


 pub enum Node {
    Task(Task),
-    RunningTask,
+    RunningTask(Option<Arc<Task>>),


Could we have a comment on this as to the meaning of the Option?

Refactored the Option away.

Mark-Simulacrum · 2018-07-03T22:37:29Z

src/run_graph.rs

                    if let Node::Task(task) = content {
+                        let task = Arc::new(task);


Generally speaking this looks like an odd thing to me -- should we be constructing the Arc somewhere more at the initial place tasks get created?

Sure, done.

This has the nice side-effect of skipping the prepare step of crates already fully tested.

aidanhs · 2018-07-08T22:19:19Z

config.toml

+#  - skip-tests      (bool): don't run tests in this crate/repo
+#  - quiet           (bool): don't kill after two minutes without output
+#  - update-lockfile (bool): update the lockfile even if the crate has one
+#  - broken          (bool): skip the crate only if there is an crater error during a stage


Can we rephrase this a bit? Maybe "Treat a crater error on this crate as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible')" - you might need to word wrap.

Or something shorter if it seems appropriate. It was just initially confusing to puzzle out what we're trying to achieve by transforming an 'Error' into a 'BuildFail' until I remembered your (reasonable) position on making unexpected errors loud.

Changed to:

Treat a Crater error on this crate/repo as a build failure (typically the crate is broken in an unusual way and we want to indicate the failure is 'permissible', while still building it if the failure is resolved in the future)

aidanhs · 2018-07-08T23:24:47Z

src/ex.rs

-            capture_lockfile(ex, c, path, toolchain)
-        }).chain_err(|| format!("failed to generate lockfile for {}", c));
-        if let Err(e) = r {
+        if let Err(e) = capture_lockfile(config, ex, c, toolchain) {


This is a nice cleanup 👍

aidanhs · 2018-07-08T23:37:04Z

src/run_graph.rs

        let mut dependencies = 0;

        // Try to check for the dependencies of this node
        // The list is collected to make the borrowchecker happy
        let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>();
        for neighbor in neighbors.drain(..) {
-            match self.walk_graph(neighbor) {
+            match self.walk_graph(neighbor, ex, db) {
                WalkResult::Task(id, task) => return WalkResult::Task(id, task),
                WalkResult::Blocked => dependencies += 1,


We could just early exit with ::Blocked here I think? And then below we can just assume that there are no dependencies.

Well, actually no, if we early return there only one task is executed at the time, because the graph walker has no chance to look for tasks in the other dependent nodes. Fixed in #265

aidanhs · 2018-07-08T23:45:19Z

src/run_graph.rs

+                                .mark_as_failed(id, ex, db, &e, result)?;
+                        } else {
+                            graph.lock().unwrap().mark_as_completed(id);
+                        }
                    } else {
                        break;


I'd add a comment here to explicitly state that "Threads terminate as soon as there's no available work, which is subideal in general but fine for crater today because the dependency chains are linear, so the completion of one task will never permit more than one more to be executed."

(assuming I'm remembering correctly that the dependency chains are linear)

aidanhs · 2018-07-08T23:47:45Z

src/tasks.rs

@@ -58,18 +59,63 @@ impl fmt::Debug for Task {
 }

 impl Task {
-    pub fn run<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> Result<()> {
+    pub fn needs_exec<DB: WriteResults>(&self, ex: &Experiment, db: &DB) -> bool {
+        // A prepare step should already be executed, and other steps only if were not executed


Should 'already' say 'always' here? Otherwise I'm a bit confused about how to reconcile the comment with code that seems to suggest a prepare step is always ready to go.

In theory the prepare step is always executed, but in practice it won't be executed if all the dependent tasks will not be executed, since the runner won't reach the prepare node.

Reworded the comment to better explain that.

aidanhs · 2018-07-08T23:51:35Z

src/tasks.rs

+            | TaskStep::BuildOnly { ref tc, .. }
+            | TaskStep::CheckOnly { ref tc, .. }
+            | TaskStep::UnstableFeatures { ref tc } => db
+                .already_executed(ex, tc, &self.krate)


Can we call this method something more obvious like get_result?

aidanhs · 2018-07-08T23:58:33Z

src/run_graph.rs

+        {
+            if !task.needs_exec(ex, db) {
+                return WalkResult::NotBlocked;
+            }


So !needs_exec will only return true if a crate result is already completed according to the database, but for some reason the graph doesn't reflect it (perhaps when resuming an experiment?). In this case, shouldn't we take the opportunity to remove the node from the graph so we don't keep traversing over it?

Yeah, that might be a good idea, implemented that.

aidanhs · 2018-07-09T00:00:58Z

src/run_graph.rs

        let mut dependencies = 0;

        // Try to check for the dependencies of this node
        // The list is collected to make the borrowchecker happy
        let mut neighbors = self.graph.neighbors(node).collect::<Vec<_>>();
        for neighbor in neighbors.drain(..) {
-            match self.walk_graph(neighbor) {


I'd add a comment here to say something like "Should recurse a maximum of one time, since completed nodes should have been removed from the graph".

aidanhs · 2018-07-09T00:05:38Z

src/run_graph.rs

        let root = self.root;
-        if let WalkResult::Task(id, task) = self.walk_graph(root) {
+        if let WalkResult::Task(id, task) = self.walk_graph(root, ex, db) {
            Some((id, task))
        } else {


It seems to me that (today) the only two permissible variants here are Task and NotBlocked - Blocked indicates that graph dependencies have become nonlinear (possible at some point in the future) and something needs to change at the higher level to ensure we don't lose multithreading due to thread early exit.

What if we explicitly matched Blocked here to panic and say "If this panics, you've changed dependencies to become nonlinear and you need to change the top level threading behaviour to make sure threads don't exit early if they find themselves blocked". Just don't want to be tracking down strange slowdowns down the line.

Blocked already happens today (for example when a prepare task is running, all the dependent tasks will be marked as blocked), so panicking on it doesn't seem like a good idea :P

pietroalbini · 2018-07-09T06:27:58Z

Addressed all of the comments except threads exiting early. I'll open another PR for that. Merging this as soon as Travis goes green.

pietroalbini · 2018-07-09T06:54:17Z

Finally 🎉 🎉 🎉

pietroalbini force-pushed the better-run-graph branch from 8491c47 to 0f6fa6e Compare May 18, 2018 10:00

pietroalbini mentioned this pull request May 20, 2018

capture_lockfile offline if available #178

Merged

pietroalbini force-pushed the better-run-graph branch 4 times, most recently from 4e10c30 to 4e69008 Compare May 20, 2018 20:07

pietroalbini mentioned this pull request May 21, 2018

Add libbreakpad-client-sys to the blacklist #223

Closed

pietroalbini force-pushed the better-run-graph branch 2 times, most recently from e457ab5 to b396573 Compare May 21, 2018 21:50

pietroalbini force-pushed the better-run-graph branch from b396573 to f8e90f0 Compare June 22, 2018 22:36

Mark-Simulacrum reviewed Jul 3, 2018

View reviewed changes

pietroalbini force-pushed the better-run-graph branch from f8e90f0 to 7edcf71 Compare July 4, 2018 12:45

pietroalbini added 16 commits July 6, 2018 23:51

run_graph: use a stable graph

6c144d5

run_graph: skip already executed tasks while walking the graph

3b2e4d0

This has the nice side-effect of skipping the prepare step of crates already fully tested.

run_graph: properly report failures during task execution

c7453a3

report: add a new "errors" section

15bc7af

run_graph: return errors when there are errors in the prepare task

809580f

toolchain: use different target directories for different threads

a1cfac5

report: don't output the sha if it's not available

fc8b394

ex: add a config option to force updating the lockfile

c2a95ea

toml_frobber: refactor the code and add tests

169fe5e

toml_frobber: frob target-specific dependencies

06205ca

toml_frobber: remove parent workspace from Cargo.toml

d5097cf

config: add a new "broken" crate config option

e1c64a9

config.toml: add broken crates

0fe64cf

gh-apps: remove invalid repos

39f3368

docs: update legacy workflow to use run-graph

3e02e14

docs: update CLI usage to use run-graph

b7ef296

pietroalbini force-pushed the better-run-graph branch from 02e058a to b7ef296 Compare July 6, 2018 21:56

aidanhs reviewed Jul 8, 2018

View reviewed changes

config: improve broken description

0a582ca

aidanhs reviewed Jul 8, 2018

View reviewed changes

aidanhs reviewed Jul 9, 2018

View reviewed changes

*: random improvements

9eb154c

pietroalbini merged commit 83d2693 into rust-lang:master Jul 9, 2018

pietroalbini deleted the better-run-graph branch July 9, 2018 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to run-graph #234

Improvements to run-graph #234

pietroalbini commented May 18, 2018 •

edited

Loading

Eh2406 commented Jun 28, 2018

pietroalbini commented Jun 28, 2018

Mark-Simulacrum Jul 3, 2018

pietroalbini Jul 4, 2018

Mark-Simulacrum Jul 3, 2018

pietroalbini Jul 4, 2018

aidanhs Jul 8, 2018

pietroalbini Jul 8, 2018

aidanhs Jul 8, 2018

aidanhs Jul 8, 2018

pietroalbini Jul 9, 2018

pietroalbini Jul 9, 2018

aidanhs Jul 8, 2018 •

edited

Loading

aidanhs Jul 8, 2018 •

edited

Loading

pietroalbini Jul 9, 2018

aidanhs Jul 8, 2018

pietroalbini Jul 9, 2018

aidanhs Jul 8, 2018

pietroalbini Jul 9, 2018

aidanhs Jul 9, 2018

pietroalbini Jul 9, 2018

aidanhs Jul 9, 2018

pietroalbini Jul 9, 2018

pietroalbini commented Jul 9, 2018

pietroalbini commented Jul 9, 2018

		if let Node::Task(task) = content {
		let task = Arc::new(task);

Improvements to run-graph #234

Improvements to run-graph #234

Conversation

pietroalbini commented May 18, 2018 • edited Loading

The new errors section

Eh2406 commented Jun 28, 2018

pietroalbini commented Jun 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aidanhs Jul 8, 2018 • edited Loading

Choose a reason for hiding this comment

aidanhs Jul 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pietroalbini commented Jul 9, 2018

pietroalbini commented Jul 9, 2018

pietroalbini commented May 18, 2018 •

edited

Loading

aidanhs Jul 8, 2018 •

edited

Loading

aidanhs Jul 8, 2018 •

edited

Loading