Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running compile commands can cause incorrect dirty status computation #1162

Closed
jdrouhard opened this issue Jun 6, 2016 · 8 comments · Fixed by #1753 or #1943
Closed

Long running compile commands can cause incorrect dirty status computation #1162

jdrouhard opened this issue Jun 6, 2016 · 8 comments · Fixed by #1753 or #1943
Labels
Milestone

Comments

@jdrouhard
Copy link
Contributor

This was a bit of a head scratcher for us and took us awhile to nail down. It seems that ninja is looking purely at the mtimes of inputs and outputs to determine if an edge is dirty. This doesn't seem to be quite sufficient, because a long running compile command might have an input that is changed during the compilation, which would still result in an output file with a newer mtime than the most recent input file, even though the output file is actually dirty since it was started with an older version of one of its inputs.

This is difficult to reproduce, but I did the following:

  1. Create a gcc wrapper that splits compile commands into two: a preprocess step and a compile step.

  2. Add a sleep between the preprocess and compile to simulate a long running compile command.

  3. Create a simple build.ninja file that just compiles a single test file into an executable.

    # build.ninja
    rule cxx
      command = /path/to/gcc-wrapper -o $out -c $in
    rule linker
      command = /path/to/gcc-wrapper $in -o $out
    build test.c.o: cxx /path/to/test.c
    build repro: linker test.c.o
    
    // test.c
    int main() { return 0; }
  4. Run ninja using the wrapper. While it's sleeping, change the source file (eg, return a different value).

After it finishes linking, running the executable will return the original value (as expected), but subsequent ninja runs will not detect that the output is dirty and will report nothing to do, even though it is stale.

This is an issue for us here as we have a lot of large translation units and tend to have a rapid modify/build/test cycle.

The following code is where it checks the mtimes of the most recent input against the output:

ninja/src/graph.cc

Lines 161 to 184 in b231274

// Dirty if the output is older than the input.
if (most_recent_input && output->mtime() < most_recent_input->mtime()) {
TimeStamp output_mtime = output->mtime();
// If this is a restat rule, we may have cleaned the output with a restat
// rule in a previous run and stored the most recent input mtime in the
// build log. Use that mtime instead, so that the file will only be
// considered dirty if an input was modified since the previous run.
bool used_restat = false;
if (edge->GetBindingBool("restat") && build_log() &&
(entry = build_log()->LookupByOutput(output->path()))) {
output_mtime = entry->restat_mtime;
used_restat = true;
}
if (output_mtime < most_recent_input->mtime()) {
EXPLAIN("%soutput %s older than most recent input %s "
"(%d vs %d)",
used_restat ? "restat of " : "", output->path().c_str(),
most_recent_input->path().c_str(),
output_mtime, most_recent_input->mtime());
return true;
}
}

Though I'm not sure what the best (fastest) way to actually solve this would be, I'm willing to take a crack at it if we come up with a decent idea for how to do so.

@jimon
Copy link

jimon commented Jun 6, 2016

I guess it's fine for any build system (ninja, make, etc) to make an assumption "inputs are not changing while command is running", so build system should write down to cache mtimes of inputs before it started executing a command, and than build system should gracefully pickup new changes in second run. At least this is what I expect from any build system on the market.

@evmar
Copy link
Collaborator

evmar commented Jun 6, 2016

Two thoughts:

  1. Editing files while you compile is a recipe for confusing build outputs, even if the build system is correct. I think the Google build system (bazel) actually aborts your build if it detects that you modified a file while it was building. We could make Ninja do that by re-stat()ing all inputs after a command completes and deleting the output if they changed.
  2. You probably don't like that answer. Making this just work is probably more involved. I have come to understand that using mtimes at all for time comparisons is a mistake. A better approach is to compute some sort of hash of "all the input stuff" -- which includes the the set of input files, their mtimes, and the command line -- and then rebuild if that hash ever changes. (That model also would make it trivial to extend that into a system which completely ignores mtimes and just uses the hash of the inputs' contents as well, which is a feature people have asked for.) We in fact have something sorta like this already in place for handling command line changes -- see the HashCommand right below and imagine also feeding it all the input mtimes. However, implementing this would be a pretty involved change to Ninja, and I just thought of the idea right now so I haven't yet spotted the flaws in it.

@dpranke
Copy link

dpranke commented Jun 6, 2016

On Mon, Jun 6, 2016 at 4:23 PM, Evan Martin notifications@github.com
wrote:

Two thoughts:

Editing files while you compile is a recipe for confusing build
outputs, even if the build system is correct. I think the Google build
system (bazel) actually aborts your build if it detects that you modified a
file while it was building. We could make Ninja do that by re-stat()ing all
inputs after a command completes and deleting the output if they changed.
2.

You probably don't like that answer. Making this just work is probably
more involved. I have come to understand that using mtimes at all for time
comparisons is a mistake. A better approach is to compute some sort of hash
of "all the input stuff" -- which includes the the set of input files,
their mtimes, and the command line -- and then rebuild if that hash ever
changes. (That model also would make it trivial to extend that into a
system which completely ignores mtimes and just uses the hash of the
inputs' contents as well.) We in fact have something sorta like this
already in place for handling command line changes. However, implementing
this would be a pretty involved change to Ninja, and I just thought of the
idea right now so I haven't yet spotted the flaws in it.

I think it is very difficult if not impossible to do (2) and have things
still be fast, unless you keep the build tool running and watch for file
changes. My vague understanding is that this is one of the reasons chrome
(or NaCl?) moved away from SCons in the first place (because SCons used MD5
hashes by default), but that was before my time ...

-- Dirk

@evmar
Copy link
Collaborator

evmar commented Jun 6, 2016

Sorry, I mixed two ideas there: there's 2a (mix file mtimes into the hash of "all inputs"), which should be about as fast as the current code and will solve this bug.

And then I mentioned content hashing as idea 2b that this would enable, but it's something to consider another time. (It definitely has speed challenges but they are surmountable.)

@jdrouhard
Copy link
Contributor Author

@martine: What about a simpler solution (for now at least), where the log file keeps track of the mtime of the most recent input for each output at the time the build was started? This would require a log version bump, but I've got this working locally. I can clean it up and submit the pull request if you are interested in seeing it.

Basically, in Builder::FinishCommand(), I loop through the edge inputs (similar to the loop when restat is in use), and store the most recent input's timestamp on the LogEntry. Then in DependencyScan::RecomputeOutputDirty(), I use that stored timestamp instead of the output's actual mtime if it's available when comparing it against the new most recent input's mtime. This fixes the reproduceable described above.

@jdrouhard
Copy link
Contributor Author

Or instead of storing the most recent input's time, we could just store the time that the edge started building and use that as the output's mtime on subsequent runs. The most recent input's time doesn't quite work with generated inputs.

@maximuska
Copy link
Contributor

maximuska commented Jun 8, 2016

Rather than changing ninja, wouldn't it be sufficient to wrap your build commands in a script which would record mtime (or md5) of the input files before running the tool, verify these are still the same after the tool has written the output, and touching the changed inputs to make these newer than the output (or deleting the output and failing the command if you consider this an error)?

This won't be quite trivial with the implicit dependencies (I mean header files which are only discovered in process of building the target, using 'dep files'), but none of your proposals seems to address that either.

@jdrouhard
Copy link
Contributor Author

Just to update this issue, I did resolve the bug a few months back with my pull request referenced above. This could technically be solved with a wrapper script, but is unnecessarily complex and error-prone. The build tool should philosophically be able to handle cases such as this.

My fix resolves all the issues I listed, plus the concern @maximuska brought up. On subsequent builds, it will always use the time the output began building as its "mtime" so any implicit dependency discovery or input file differences will be compared against the time the output began building instead of comparing against its actual filesystem mtime.

It makes ninja's dirty output checking actually correct by removing the race condition that occurs by changing an input between the output starting its build and the output being written to disk.

If there are any qualms about my pull request, we can start a discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment