Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve incremental build: make ninja handle dynamic outputs #1953

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Dragnalith
Copy link

@Dragnalith Dragnalith commented Apr 15, 2021

This PR is about making ninja being aware of dynamic outputs, it is related to this discussion on google groups.

The Problem

The general problem I am trying to solve is improving the correctness of the incremental build. I think we can say the incremental build is correct when the result of running ninja is the same as doing a full build (enforcing all rule command to be executed again). Today there are at least two scenarios where this is not the case:

Example In Practice

In my project, we are currently having a C++ code generatorwhich generates several header files out of the same source of data (one header per class). If someone delete or modified the header files in local, ninja will not be aware of that and won't re-run the generator. This issue can break the build or worse result in unexpected behavior of the application. When people report issues which could be related to incremental build issues, the first step is often "have you rebuild from scratch?". I would like to avoid the situation where people have to rebuilt from scratch to be sure their build state is correct.

My Solution

One solution to make ninja aware about dynamic outputs, is to have a mechanism similar to depfile to inform ninja about dynamic outputs during the build. My implementation introduce the dynout attribute which indicate the path to a file generated by the rule command containing the list of the outputs. The current syntax is simply having file path per line.

Example:

rule cpp_gen: codegen.exe --depfile $out.d --dynout $out.dynout --stamp $out $in
    depfile = $out.d
    dynout = $oud.dynout

build mydata.stamp: cpp_gen mydata.json

The dynamic outputs are stored in the deps log. As a matter of implementation it is very straightforward, I think it is the way to go. But it regarding naming, "deps" would not fit the concept anymore, because deps log will now contain dynamic dependencies as well as dynamic outputs.

I have also added a tool to list all the output including dynamic outputs in order to diagnosis the build.

Copy link
Contributor

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is missing. At least the mention of the dynout stuff and the grammar for the expected format of it. Test cases would also be really nice to see.

Other than that, it looks sensible.

src/graph.cc Outdated
@@ -614,6 +628,113 @@ bool ImplicitDepLoader::ProcessDepfileDeps(
return true;
}


bool ImplicitDepLoader::LoadDynOutFile(State* state, DiskInterface* disk_interface, Edge* edge, const string& path,
vector<Node*>* nodes, int* outputs_count, string* err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is best written in a more straight-forward parser way (see lexer.in.cc and its re2c comments).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the parser code into dynout_parser.cpp, it did not make sense to have it in graph.cc but only used in build.cc. I am not using re2c because the parsing is for now very straightforward. It is just looking for new line.

src/ninja.cc Outdated
@@ -1009,6 +1047,8 @@ const Tool* ChooseTool(const string& tool_name) {
Tool::RUN_AFTER_LOGS, &NinjaMain::ToolQuery },
{ "targets", "list targets by their rule or depth in the DAG",
Tool::RUN_AFTER_LOAD, &NinjaMain::ToolTargets },
{ "outputs", "list all outputs of the build graph, include dynamic outputs if there are and they have been built.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there areif there are any

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/ninja.cc Outdated Show resolved Hide resolved
src/graph.h Outdated Show resolved Hide resolved
@@ -116,6 +116,8 @@ bool DepsLog::RecordDeps(Node* node, TimeStamp mtime,
size |= 0x80000000; // Deps record: set high bit.
if (fwrite(&size, 4, 1, file_) < 1)
return false;
if (fwrite(&outputs_count, 4, 1, file_) < 1)
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this change the deps log format version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have bumped to deps log version from 4 to 5.

@Dragnalith
Copy link
Author

@mathstuf thank you, I will revise my implementation.

@Dragnalith
Copy link
Author

@mathstuf I have added the documentation for the dynout attribute. And added one test in build_test.cc and one in clean_test.cc. Actually I had forgotten to make clean correctly clean dynamic outputs.

@Dragnalith
Copy link
Author

I have added a debug option -d keepdynout for diagnosis

@Dragnalith
Copy link
Author

I think the implementation can be considered feature complete. I have added test, debug option, consider the -t clean case, write documentation, fix whitespace issues, make sure github checks have passed.

I will continue refining it if necessary to make it reach the quality to be integrated to master.

@@ -641,6 +655,68 @@ bool ImplicitDepLoader::LoadDepsFromLog(Edge* edge, string* err) {
return true;
}

bool ImplicitDepLoader::LoadOutputsFromLog(Edge* edge, string* err) {
// NOTE: deps are only supported for single-target edges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in #1534.

case DiskInterface::Okay:
break;
case DiskInterface::NotFound:
err->clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err can by nullptr; this must be guarded. Same with all other uses of err.

@@ -855,6 +855,12 @@ keys.
stored as `.ninja_deps` in the `builddir`, see <<ref_toplevel,the
discussion of `builddir`>>.

`dynout`:: path to an optional _dynout file_ that contains the list
of outputs generated by the rule. The dynout file syntax except one
path per line. This is to make Ninja aware of dynamic outputs, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is expeccts one path per line? Are there any escaping mechanisms? I guess this means that paths with embedded newlines are not supported (not that I expect they are anywhere else either, but it'd be good to know before reviewing the parsing code)?

@jhasse jhasse added the feature label Apr 25, 2021
Copy link
Contributor

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My most recent comments still seem to have not been addressed. Just formalizing into a review.

@Dragnalith
Copy link
Author

Dragnalith commented Jul 26, 2021 via email

@HampusAdolfsson
Copy link
Contributor

Hi!
I am interested in helping get this merged. @Dragnalith would you mind if I create a new pull request based on this one to address the remaining comments and finalize this?

@Dragnalith
Copy link
Author

@HampusAdolfsson I don't mind. I don't have the energy to finish it and I will be happy to see it merged!
thank you for your help.

@johanneslerch
Copy link

We've been using this solution for quite some years, successfully. Until now, when we uncovered a hidden problem that it can cause.

Assume I have one code generator creating file a.h. Ninja becomes aware of that via the use of dynout. Hence, it stores in .ninja_deps that this one code generator creates a.h.
Now, we changed our build graph and a.h is created by some other build statement and no longer by the code generator. Means, we effectively change our build graph. Running ninja on this changed build graph with an old .ninja_deps file present is creating an error: ninja will complain that two build statements are creating the same output a.h.

This example was simplified for easier understanding, in fact, we switch back and forth between two code generators whereas only one of them will be active per build. But both create a.h and write to .ninja_deps. As we switch back and forth, simply deleting .ninja_deps is not really a solution for us.

Dropping this here to warn others, but also happy to take suggestions how this could be solved.

@HampusAdolfsson
Copy link
Contributor

@johanneslerch This PR is outdated, please use #2366 where discussion of this feature is currently ongoing.

To address your point:
In response to some of the other feedback on this implementation, I am working on changing things so that the build graph is not updated during the build when parsing new dynout files. Instead, dynamic outputs are only added to the build graph at the start of the build (from the depslog), and dynout files parsed during the build are only used to update the depslog. This behaviour is the same as for dynamic dependencies, and reflects the fact that dynamic outputs are not meant to change decisions during a build (for that, you would have to use dyndeps).

This would likely also solve your issue. If for some build, node X stops producing a.h and node Y starts producing it, they will each produce a new dynout, which is then stored in the depslog. At the start of the next build, the build graph produced from the depslog will be both valid and up-to-date.

Comment on lines +696 to +717
// Add the dyndep-discovered outputs to the edge.
edge->outputs_.insert(edge->outputs_.end(), implicit_outputs.begin(),
implicit_outputs.end());
edge->implicit_outs_ += implicit_outputs.size();

// Add this edge as incoming to each new output.
for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();
i != implicit_outputs.end(); ++i) {
if (Edge* old_in_edge = (*i)->in_edge()) {
// This node already has an edge producing it. Fail with an error
// unless the edge was generated by ImplicitDepLoader, in which
// case we can replace it with the now-known real producer.
if (!old_in_edge->generated_by_dep_loader_) {
*err = "multiple rules generate " + (*i)->path();
return false;
}
old_in_edge->outputs_.clear();
}
(*i)->set_in_edge(edge);
}

return true;
Copy link

@johanneslerch johanneslerch Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again, I'm aware that this PR is outdated, but that's what we based our version of Ninja on, so we have some interest to fix this one here. However, we do want to contribute back our insights that may help others and are of course interested in receiving feedback if we miss something.

Suggested change
// Add the dyndep-discovered outputs to the edge.
edge->outputs_.insert(edge->outputs_.end(), implicit_outputs.begin(),
implicit_outputs.end());
edge->implicit_outs_ += implicit_outputs.size();
// Add this edge as incoming to each new output.
for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();
i != implicit_outputs.end(); ++i) {
if (Edge* old_in_edge = (*i)->in_edge()) {
// This node already has an edge producing it. Fail with an error
// unless the edge was generated by ImplicitDepLoader, in which
// case we can replace it with the now-known real producer.
if (!old_in_edge->generated_by_dep_loader_) {
*err = "multiple rules generate " + (*i)->path();
return false;
}
old_in_edge->outputs_.clear();
}
(*i)->set_in_edge(edge);
}
return true;
bool hasConflict = false;
for (std::vector<Node*>::const_iterator i = implicit_outputs.begin();
i != implicit_outputs.end(); ++i) {
if (Edge* old_in_edge = (*i)->in_edge()) {
// This node already has an edge producing it.
// This can mean that there is a conflict of two build statements producing the same output,
// but it can also mean that the dynout information within .ninja_deps file is outdated and
// the dynamic output is now an explicit output of another build statement.
// We can not be sure here if the conflict is real or not, so we return false to mark the output as dirty.
// The conflict detection needs to happen when processing the dynout file from within Builder::FinishCommand.
EXPLAIN("Dynamic output '%s' of a previous execution can now be created via '%s', so we consider '%s' as dirty.",
(*i)->path().c_str(), old_in_edge->rule_->name().c_str(), output->path().c_str());
hasConflict = true;
} else {
edge->outputs_.push_back(*i);
edge->implicit_outs_++;
(*i)->set_in_edge(edge);
}
}
return !hasConflict;

We've put some examples here that we used to reproduce the problem mentioned earlier, and then also to confirm it's gone: https://github.com/johanneslerch/ninja-dynouts-example

ninja-1.11.1.conti.2.exe is based on Ninja 1.11.1 plus some contributions like this PR.
ninja-1.11.1.conti.3.exe is having the fix suggested in this review comment. This version works like we expect it to.
ninja-pr2366-Jun12.exe is built from #2366. It is not behaving according to our expectations. It's basically re-executing build statements always and also not detecting conflicts of multiple build statements producing the same output file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this one. In the meantime we discovered this does not work either in all situations. For now, we discontinued working on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants