Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restat rules that generate build.ninja can crash ninja #874

Closed
colincross opened this issue Dec 3, 2014 · 7 comments
Closed

Restat rules that generate build.ninja can crash ninja #874

colincross opened this issue Dec 3, 2014 · 7 comments

Comments

@colincross
Copy link
Contributor

I'm seeing a crash in ninja (master and v1.5.3):
(gdb) bt
#0 0x00007f3b268dbbb9 in __GI_raise (sig=sig@entry=6)

at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1 0x00007f3b268defc8 in __GI_abort () at abort.c:89
#2 0x00007f3b26918e14 in __libc_message (do_abort=do_abort@entry=1,

fmt=fmt@entry=0x7f3b26a27668 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

#3 0x00007f3b269250ee in malloc_printerr (ptr=,

str=0x7f3b26a23759 "free(): invalid pointer", action=1) at malloc.c:4996

#4 _int_free (av=, p=, have_lock=0)

at malloc.c:3840

#5 0x0000000000409d65 in deallocate (this=,

__p=<optimized out>) at /usr/include/c++/4.8/ext/new_allocator.h:110

#6 _M_put_node (this=, __p=)

at /usr/include/c++/4.8/bits/stl_tree.h:374

#7 _M_destroy_node (this=, __p=)

at /usr/include/c++/4.8/bits/stl_tree.h:396

#8 _M_erase_aux (__position=..., this=0x2c14a40)

at /usr/include/c++/4.8/bits/stl_tree.h:1746

#9 erase (__position=..., this=0x2c14a40)

at /usr/include/c++/4.8/bits/stl_tree.h:826

#10 erase (__position=..., this=0x2c14a40)

at /usr/include/c++/4.8/bits/stl_map.h:711

#11 BuildStatus::BuildEdgeFinished (this=0x2c14a20, edge=edge@entry=0x2391220,

success=<optimized out>, output=..., 
start_time=start_time@entry=0x7fff9a8bfeb8, 
end_time=end_time@entry=0x7fff9a8bfebc) at src/build.cc:116

#12 0x000000000040a14c in Builder::FinishCommand (

this=this@entry=0x7fff9a8c00f0, result=result@entry=0x7fff9a8bffa0, 
err=err@entry=0x7fff9a8c0050) at src/build.cc:744

#13 0x000000000040ab9f in Builder::Build (this=this@entry=0x7fff9a8c00f0,

err=err@entry=0x7fff9a8c0050) at src/build.cc:650

#14 0x0000000000405b1f in RunBuild (argv=,

argc=<optimized out>, this=0x7fff9a8c01b0) at src/ninja.cc:917

#15 (anonymous namespace)::real_main (argc=,

argv=<optimized out>) at src/ninja.cc:1102

#16 0x00007f3b268c6ec5 in __libc_start_main (

main=0x4028c0 <main(int, char**)>, argc=1, argv=0x7fff9a8c0468, 
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
stack_end=0x7fff9a8c0458) at libc-start.c:287

#17 0x0000000000402ac0 in _start ()

I've tracked this down to a file being built twice, which results in downstream dependencies being built twice. If the downstream dependency is slow to build it can run concurrently, and the second one to finish crashes when running_edges_.end() returned fromo running_edges_.find(edge) is passed into running_edges_.erase(i).

This vastly simplified build.ninja repros the spurious second build (but not the crash, even if I try to add a slow to build file that depends on out1.file, so it may not be triggering the exact same problem):
rule notouch
command = # touch ${out}
restat = true

rule touch
command = touch ${out}

build out1.file : touch in.file
default out1.file

build out2.file: notouch out1.file

build build.ninja : notouch
depfile = build.ninja.d

And build.ninja.d:
build.ninja:
out2.file

Repro steps:
touch in.file && ninja -d explain -v
ninja explain: output out1.file older than most recent input in.file (1417576914 vs 1417576918)
ninja explain: out1.file is dirty
ninja explain: out2.file is dirty
[1/3] touch out1.file
[2/3] # touch out2.file
ninja explain: output out1.file older than most recent input in.file (1417576914 vs 1417576918)
[1/1] touch out1.file

@nico
Copy link
Collaborator

nico commented Dec 3, 2014

The stack looks similar to the stack I'm seeing in the first comments in issue #867 – are you getting a ninja: warning: multiple rules generate X. builds involving this target will not be correct; continuing anyway warning in your non-reduced case, by chance?

@colincross
Copy link
Contributor Author

I don't get a multiple rules warning, so might not be exactly the same issue as #867. Here is a reduced build.ninja that crashes for me:

rule sleep
     command = sleep .1; touch ${out}

rule touch
     command = touch ${out}

rule notouch
    command = # touch ${out}
    restat = true

rule write_depfile
     command = echo -e "build.ninja:\n dep" > ${out};

build 3 : touch in.file 2
default 3

build dep : notouch 2
default dep

build 4 : touch 3
default 4

build 5 : sleep 4
default 5

build 6 : sleep 2
default 6

build 1 : touch in.file

build 2 : touch 1
default 2

build build.ninja: notouch | depfile
    depfile = depfile
default build.ninja

build depfile : write_depfile

Then:

$ touch in.file && ninja -v -d explain
ninja explain: output 1 older than most recent input in.file (1417593893 vs 1417593942)
ninja explain: 1 is dirty
ninja explain: 2 is dirty
ninja explain: dep is dirty
[1/4] touch 1
[2/4] touch 2
[3/4] # touch dep
ninja explain: 2 is dirty
ninja explain: 2 is dirty
ninja explain: 3 is dirty
ninja explain: 4 is dirty
ninja explain: 2 is dirty
ninja explain: 1 is dirty
ninja explain: dep is dirty
ninja explain: dep is dirty
[1/7] touch 3
[2/7] # touch dep
[3/7] sleep .1; touch 6
[4/7] touch 2
[5/7] touch 3
[6/7] # touch dep
[7/7] sleep .1; touch 6
[8/7] touch 4
*** Error in `/usr/bin/ninja': munmap_chunk(): invalid pointer: 0x00000000009c85c8 ***
Aborted (core dumped)

It's very sensitive to tiny modifications of the build.ninja file. Reordering the build statements, removing the redundant looking build statements, or remove the default statements will all cause the crash to stop happening.

@colincross
Copy link
Contributor Author

I think I've found the problem. Here's a slightly simplified build.ninja that triggers it (the depfile was a red herring):

rule sleep
     command = sleep .1; touch ${out}

rule touch
     command = touch ${out}

rule notouch
    command = # touch ${out}
    restat = true

build 2 : touch in.file 1
default 2

build dep : touch 1
default dep

build 3 : touch 2
default 3

build 4 : sleep 3
default 4

build 5 : sleep 1
default 5

build 1 : touch in.file
default 1

build build.ninja: notouch dep

The important files are 1, dep, and 5. 2, 3 and 4 are necessary to get 5 building twice concurrently in order to show the crash - without them 5 builds twice sequentially which doesn't cause the running_edges_ crash - but are not involved in the dependency graph that causes the double building.

When ninja is run, it first attempts to rebuild build.ninja. This causes the dependency chain 1 -> dep -> build.ninja to rebuild, but the final rule to rebuild build.ninja is a restat rule that doesn't update the output file, so ninja.RebuildManifest returns false and ninja continues to ninja.RunBuild. At this point the output edges for 1 are marked outputs_ready_ == true, but mtime_ is still set to the original mtime before ninja rebuilt 1. The mismatch in state is what triggers the double building and then crashing later.

ninja.RunBuild calls AddTarget on the default targets in file order; 2, 3, 4, and 5, and then finally 1. When it calls AddTarget on 5 it sees that all its inputs are ready, because 1 is has set outputs_ready_ == true, and it starts building 5. Then it calls AddTarget on 1, which checks the old cached mtime_ of 1 against the mtime of in.file, sees that its older, and builds 1 again. When 1 finishes it iterates over its out_edges, sees 5 has all its inputs ready, and schedules 5 again. If 5 builds twice concurrently (helped in the build.ninja file above by the sleep .1), running_edges_.find() returns map::end which crashes running_edges_.erase().

I'm not sure what the right fix is here. Update the cached mtime to be the mtime of the most recent input when a node finishes? Always restat after the RebuildManifest step by restarting the cycle loop?

@nico nico changed the title Ninja builds the same file twice, which can crash ninja Restat rules that generate build.ninja can crash ninja Dec 3, 2014
@nico
Copy link
Collaborator

nico commented Dec 3, 2014

Great analysis!

Maybe we can forbid restat on edges generating build.ninja? Is that ever useful? (It seems like a somewhat arbitrary restriction, though.) Always restarting the loop sounds fine to me too.

@nico
Copy link
Collaborator

nico commented Dec 3, 2014

…i.e. revert 6cf2f5a from #223. @pcc, do you remember why you added that – would you be opposed to reverting it? If so, other thoughts on how to fix this crash?

@pcc
Copy link
Contributor

pcc commented Dec 3, 2014

That change is only an optimization so I think it would be fine to revert it for now.

We should eventually fix running multiple builds over the same manifest. This will eventually be necessary for continuous builds to work, and I have occasionally observed incorrect builds on my continuous build branch which may be related to this issue. That can all come later though.

@nico nico closed this as completed in 8605b3d Dec 4, 2014
nico added a commit that referenced this issue Dec 4, 2014
@nico
Copy link
Collaborator

nico commented Dec 4, 2014

Should be better at head.

danw added a commit to danw/ninja that referenced this issue Feb 8, 2017
As a fix for ninja-build#874, we started reloading the entire manifest even if the
manifest was never rebuilt due to a restat rule. But this can be slow,
so call State::Reset instead, which also fixes the original crash.

Fixes ninja-build#987
mydongistiny pushed a commit to mydongistiny/ninja that referenced this issue Feb 8, 2017
As a fix for ninja-build#874, we started reloading the entire manifest even if the
manifest was never rebuilt due to a restat rule. But this can be slow,
so call State::Reset instead, which also fixes the original crash.

Fixes ninja-build#987

Test: replicate original crash, test with this change
Test: do a build that rebuilds manifest, ensure it reloads
Test: do a build that uses restat, verify that it's faster
Change-Id: Ifeada4afa1717a3691f2e787d1135c0489864629
Signed-off-by: mydongistiny <jaysonedson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants