Improve incremental build: make ninja handle dynamic outputs #1953

Dragnalith · 2021-04-15T14:26:39Z

This PR is about making ninja being aware of dynamic outputs, it is related to this discussion on google groups.

The Problem

The general problem I am trying to solve is improving the correctness of the incremental build. I think we can say the incremental build is correct when the result of running ninja is the same as doing a full build (enforcing all rule command to be executed again). Today there are at least two scenarios where this is not the case:

If you modified an output (manually or by mistake), ninja will not rebuilt. (this problem is addressed in the PR Improve incremental build: Make outputs modified outside of the build system considered dirty #1951)
If you rule command create files that cannot be predicted, i.e dynamic outputs, ninja will not be aware about those file and won't re-run the rule command if those files are modified or deleted. (this is the problem addressed by this PR)

Example In Practice

In my project, we are currently having a C++ code generatorwhich generates several header files out of the same source of data (one header per class). If someone delete or modified the header files in local, ninja will not be aware of that and won't re-run the generator. This issue can break the build or worse result in unexpected behavior of the application. When people report issues which could be related to incremental build issues, the first step is often "have you rebuild from scratch?". I would like to avoid the situation where people have to rebuilt from scratch to be sure their build state is correct.

My Solution

One solution to make ninja aware about dynamic outputs, is to have a mechanism similar to depfile to inform ninja about dynamic outputs during the build. My implementation introduce the dynout attribute which indicate the path to a file generated by the rule command containing the list of the outputs. The current syntax is simply having file path per line.

Example:

rule cpp_gen: codegen.exe --depfile $out.d --dynout $out.dynout --stamp $out $in
    depfile = $out.d
    dynout = $oud.dynout

build mydata.stamp: cpp_gen mydata.json

The dynamic outputs are stored in the deps log. As a matter of implementation it is very straightforward, I think it is the way to go. But it regarding naming, "deps" would not fit the concept anymore, because deps log will now contain dynamic dependencies as well as dynamic outputs.

I have also added a tool to list all the output including dynamic outputs in order to diagnosis the build.

This tool list all the output generated by the graph, including dynamic output if they have been built.

mathstuf

Documentation is missing. At least the mention of the dynout stuff and the grammar for the expected format of it. Test cases would also be really nice to see.

Other than that, it looks sensible.

mathstuf · 2021-04-15T15:32:58Z

src/graph.cc

@@ -614,6 +628,113 @@ bool ImplicitDepLoader::ProcessDepfileDeps(
  return true;
 }

+
+bool ImplicitDepLoader::LoadDynOutFile(State* state, DiskInterface* disk_interface, Edge* edge, const string& path,
+                                       vector<Node*>* nodes, int* outputs_count, string* err) {


I think this is best written in a more straight-forward parser way (see lexer.in.cc and its re2c comments).

I have moved the parser code into dynout_parser.cpp, it did not make sense to have it in graph.cc but only used in build.cc. I am not using re2c because the parsing is for now very straightforward. It is just looking for new line.

mathstuf · 2021-04-15T15:33:33Z

src/ninja.cc

@@ -1009,6 +1047,8 @@ const Tool* ChooseTool(const string& tool_name) {
      Tool::RUN_AFTER_LOGS, &NinjaMain::ToolQuery },
    { "targets",  "list targets by their rule or depth in the DAG",
      Tool::RUN_AFTER_LOAD, &NinjaMain::ToolTargets },
+    { "outputs", "list all outputs of the build graph, include dynamic outputs if there are and they have been built.",


if there are → if there are any

src/ninja.cc

src/graph.h

mathstuf · 2021-04-15T15:35:39Z

src/deps_log.cc

@@ -116,6 +116,8 @@ bool DepsLog::RecordDeps(Node* node, TimeStamp mtime,
  size |= 0x80000000;  // Deps record: set high bit.
  if (fwrite(&size, 4, 1, file_) < 1)
    return false;
+  if (fwrite(&outputs_count, 4, 1, file_) < 1)
+    return false;


Doesn't this change the deps log format version?

I have bumped to deps log version from 4 to 5.

Dragnalith · 2021-04-16T03:27:15Z

@mathstuf thank you, I will revise my implementation.

Dragnalith · 2021-04-16T09:07:44Z

@mathstuf I have added the documentation for the dynout attribute. And added one test in build_test.cc and one in clean_test.cc. Actually I had forgotten to make clean correctly clean dynamic outputs.

Dragnalith · 2021-04-16T10:29:13Z

I have added a debug option -d keepdynout for diagnosis

Dragnalith · 2021-04-16T10:51:08Z

I think the implementation can be considered feature complete. I have added test, debug option, consider the -t clean case, write documentation, fix whitespace issues, make sure github checks have passed.

I will continue refining it if necessary to make it reach the quality to be integrated to master.

mathstuf · 2021-04-19T11:12:08Z

src/graph.cc

@@ -641,6 +655,68 @@ bool ImplicitDepLoader::LoadDepsFromLog(Edge* edge, string* err) {
  return true;
 }

+bool ImplicitDepLoader::LoadOutputsFromLog(Edge* edge, string* err) {
+  // NOTE: deps are only supported for single-target edges.


This was fixed in #1534.

mathstuf · 2021-04-19T11:13:03Z

src/dynout_parser.cc

+  case DiskInterface::Okay:
+    break;
+  case DiskInterface::NotFound:
+    err->clear();


err can by nullptr; this must be guarded. Same with all other uses of err.

mathstuf · 2021-04-19T11:17:12Z

doc/manual.asciidoc

@@ -855,6 +855,12 @@ keys.
   stored as `.ninja_deps` in the `builddir`, see <<ref_toplevel,the
   discussion of `builddir`>>.

+`dynout`:: path to an optional _dynout file_ that contains the list
+  of outputs generated by the rule. The dynout file syntax except one
+  path per line. This is to make Ninja aware of dynamic outputs, so 


I assume this is expeccts one path per line? Are there any escaping mechanisms? I guess this means that paths with embedded newlines are not supported (not that I expect they are anywhere else either, but it'd be good to know before reviewing the parsing code)?

mathstuf

My most recent comments still seem to have not been addressed. Just formalizing into a review.

Dragnalith · 2021-07-26T10:33:32Z

Yes, I am not focusing on that project for now. I do not know when I will go back to it. Le mar. 20 juil. 2021 à 21:42, Ben Boeckel ***@***.***> a écrit :

…

***@***.**** requested changes on this pull request. My most recent comments still seem to have not been addressed. Just formalizing into a review. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1953 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHSSFDWQWSHBEC3XFCEVUTTYVVMVANCNFSM427TVDZQ> .

HampusAdolfsson · 2023-12-21T10:29:00Z

Hi!
I am interested in helping get this merged. @Dragnalith would you mind if I create a new pull request based on this one to address the remaining comments and finalize this?

Dragnalith · 2023-12-21T10:34:43Z

@HampusAdolfsson I don't mind. I don't have the energy to finish it and I will be happy to see it merged!
thank you for your help.

johanneslerch · 2024-05-17T12:56:59Z

We've been using this solution for quite some years, successfully. Until now, when we uncovered a hidden problem that it can cause.

Assume I have one code generator creating file a.h. Ninja becomes aware of that via the use of dynout. Hence, it stores in .ninja_deps that this one code generator creates a.h.
Now, we changed our build graph and a.h is created by some other build statement and no longer by the code generator. Means, we effectively change our build graph. Running ninja on this changed build graph with an old .ninja_deps file present is creating an error: ninja will complain that two build statements are creating the same output a.h.

This example was simplified for easier understanding, in fact, we switch back and forth between two code generators whereas only one of them will be active per build. But both create a.h and write to .ninja_deps. As we switch back and forth, simply deleting .ninja_deps is not really a solution for us.

Dropping this here to warn others, but also happy to take suggestions how this could be solved.

HampusAdolfsson · 2024-05-17T14:13:03Z

@johanneslerch This PR is outdated, please use #2366 where discussion of this feature is currently ongoing.

To address your point:
In response to some of the other feedback on this implementation, I am working on changing things so that the build graph is not updated during the build when parsing new dynout files. Instead, dynamic outputs are only added to the build graph at the start of the build (from the depslog), and dynout files parsed during the build are only used to update the depslog. This behaviour is the same as for dynamic dependencies, and reflects the fact that dynamic outputs are not meant to change decisions during a build (for that, you would have to use dyndeps).

This would likely also solve your issue. If for some build, node X stops producing a.h and node Y starts producing it, they will each produce a new dynout, which is then stored in the depslog. At the start of the next build, the build graph produced from the depslog will be both valid and up-to-date.

johanneslerch · 2024-08-14T08:44:33Z

src/graph.cc

+  // Add the dyndep-discovered outputs to the edge.
+  edge->outputs_.insert(edge->outputs_.end(), implicit_outputs.begin(),
+                        implicit_outputs.end());
+  edge->implicit_outs_ += implicit_outputs.size();
+
+  // Add this edge as incoming to each new output.
+  for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();
+       i != implicit_outputs.end(); ++i) {
+    if (Edge* old_in_edge = (*i)->in_edge()) {
+      // This node already has an edge producing it.  Fail with an error
+      // unless the edge was generated by ImplicitDepLoader, in which
+      // case we can replace it with the now-known real producer.
+      if (!old_in_edge->generated_by_dep_loader_) {
+        *err = "multiple rules generate " + (*i)->path();
+        return false;
+      }
+      old_in_edge->outputs_.clear();
+    }
+    (*i)->set_in_edge(edge);
+  }
+
+  return true;


Hi again, I'm aware that this PR is outdated, but that's what we based our version of Ninja on, so we have some interest to fix this one here. However, we do want to contribute back our insights that may help others and are of course interested in receiving feedback if we miss something.

Suggested change

// Add the dyndep-discovered outputs to the edge.

edge->outputs_.insert(edge->outputs_.end(), implicit_outputs.begin(),

implicit_outputs.end());

edge->implicit_outs_ += implicit_outputs.size();

// Add this edge as incoming to each new output.

for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();

i != implicit_outputs.end(); ++i) {

if (Edge* old_in_edge = (*i)->in_edge()) {

// This node already has an edge producing it. Fail with an error

// unless the edge was generated by ImplicitDepLoader, in which

// case we can replace it with the now-known real producer.

if (!old_in_edge->generated_by_dep_loader_) {

*err = "multiple rules generate " + (*i)->path();

return false;

}

old_in_edge->outputs_.clear();

}

(*i)->set_in_edge(edge);

}

return true;

bool hasConflict = false;

for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();

i != implicit_outputs.end(); ++i) {

if (Edge* old_in_edge = (*i)->in_edge()) {

// This node already has an edge producing it.

// This can mean that there is a conflict of two build statements producing the same output,

// but it can also mean that the dynout information within .ninja_deps file is outdated and

// the dynamic output is now an explicit output of another build statement.

// We can not be sure here if the conflict is real or not, so we return false to mark the output as dirty.

// The conflict detection needs to happen when processing the dynout file from within Builder::FinishCommand.

EXPLAIN("Dynamic output '%s' of a previous execution can now be created via '%s', so we consider '%s' as dirty.",

(*i)->path().c_str(), old_in_edge->rule_->name().c_str(), output->path().c_str());

hasConflict = true;

} else {

edge->outputs_.push_back(*i);

edge->implicit_outs_++;

(*i)->set_in_edge(edge);

}

}

return !hasConflict;

We've put some examples here that we used to reproduce the problem mentioned earlier, and then also to confirm it's gone: https://github.com/johanneslerch/ninja-dynouts-example

ninja-1.11.1.conti.2.exe is based on Ninja 1.11.1 plus some contributions like this PR.
ninja-1.11.1.conti.3.exe is having the fix suggested in this review comment. This version works like we expect it to.
ninja-pr2366-Jun12.exe is built from #2366. It is not behaving according to our expectations. It's basically re-executing build statements always and also not detecting conflicts of multiple build statements producing the same output file.

Coming back to this one. In the meantime we discovered this does not work either in all situations. For now, we discontinued working on this one.

Dragnalith added 3 commits April 15, 2021 22:17

Implement 'dynout' feature to inform ninja about dynamic outputs

41e0dbd

Dynamic dependencies are stored in .ninja_deps file

d851320

Add ToolOutputs: 'ninja -t outputs'

48ae185

This tool list all the output generated by the graph, including dynamic output if they have been built.

Dragnalith mentioned this pull request Apr 15, 2021

Improve incremental build: Make outputs modified outside of the build system considered dirty #1951

Open

mathstuf suggested changes Apr 15, 2021

View reviewed changes

Dragnalith added 6 commits April 16, 2021 13:26

Fix whitespace and 'outputs' tool description

2776511

Bump deps log version

883f479

Add documentation for dynout attribute

38bceda

Add unit test for dynamic outputs

b4130f7

Fix tests

db45e14

Move dynout parser in its own file

49147e2

Dragnalith added 2 commits April 16, 2021 19:36

Add -d keepdynout option

db1e1f1

Fix CMakeLists.txt

94a5f65

Dragnalith force-pushed the dynamic_outputs branch from 94dbb74 to 94a5f65 Compare April 16, 2021 10:37

mathstuf suggested changes Apr 19, 2021

View reviewed changes

Fix dynout parser and restat scenario

671ca94

jhasse added the feature label Apr 25, 2021

Fix dynamic outputs not correctly recorded in deps log

090e623

mathstuf suggested changes Jul 20, 2021

View reviewed changes

johanneslerch mentioned this pull request Nov 26, 2021

Implement 'dynout' feature to inform ninja about dynamic outputs DanielWeber/ninja#3

Closed

HampusAdolfsson mentioned this pull request Dec 22, 2023

Improve incremental build: make ninja handle dynamic outputs (continuation) #2366

Open

johanneslerch reviewed Aug 14, 2024

View reviewed changes

johanneslerch mentioned this pull request Aug 22, 2024

Support single statement usage of dyndep #2481

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve incremental build: make ninja handle dynamic outputs #1953

Improve incremental build: make ninja handle dynamic outputs #1953

Dragnalith commented Apr 15, 2021 •

edited

Loading

mathstuf left a comment

mathstuf Apr 15, 2021

Dragnalith Apr 16, 2021

mathstuf Apr 15, 2021

Dragnalith Apr 16, 2021

mathstuf Apr 15, 2021

Dragnalith Apr 16, 2021

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

mathstuf Apr 19, 2021

mathstuf Apr 19, 2021

mathstuf Apr 19, 2021

mathstuf left a comment

Dragnalith commented Jul 26, 2021 via email

HampusAdolfsson commented Dec 21, 2023

Dragnalith commented Dec 21, 2023

johanneslerch commented May 17, 2024

HampusAdolfsson commented May 17, 2024

johanneslerch Aug 14, 2024 •

edited

Loading

johanneslerch Aug 20, 2024

Improve incremental build: make ninja handle dynamic outputs #1953

Are you sure you want to change the base?

Improve incremental build: make ninja handle dynamic outputs #1953

Conversation

Dragnalith commented Apr 15, 2021 • edited Loading

The Problem

Example In Practice

My Solution

mathstuf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

Dragnalith commented Apr 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathstuf left a comment

Choose a reason for hiding this comment

Dragnalith commented Jul 26, 2021 via email

HampusAdolfsson commented Dec 21, 2023

Dragnalith commented Dec 21, 2023

johanneslerch commented May 17, 2024

HampusAdolfsson commented May 17, 2024

johanneslerch Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dragnalith commented Apr 15, 2021 •

edited

Loading

johanneslerch Aug 14, 2024 •

edited

Loading