Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for detecting missing dependency graph edges #1660

Open
Timmmm opened this issue Oct 18, 2019 · 13 comments
Open

Method for detecting missing dependency graph edges #1660

Timmmm opened this issue Oct 18, 2019 · 13 comments

Comments

@Timmmm
Copy link

Timmmm commented Oct 18, 2019

In large CMake projects it can often be the case that you forget a dependency between two targets. But because you are lucky Ninja happens to build those two targets in the right order anyway. Maybe you're entire CI system is lucky too, but some unlucky person gets the wrong order and their build fails.

Nix (and maybe Bazel?) completely solves this by putting each target in a hashed directory whose name is only known to explicit dependants. But with Ninja you basically have to rely on luck, which sucks.

I can think of two solutions for CMake/Ninja:

  1. Use strace or LD_PRELOAD or something like that to trace file accesses. Then you can tell if one step accesses a file that was modified by another step that it doesn't explicitly depend on. This may be tricky.

  2. Allow running the build with different build orders (that are allowed by the DAG). It's probably infeasible to enumerate every possible combination for most build graphs, but running it with a few different random ones is better than nothing. Something like --build_order_seed=45.

@nicolas17
Copy link

I'm interested in random order too.

@sgtatham
Copy link

sgtatham commented Jul 6, 2020

I recently ran into one of those missing-dependency bugs in a CMake project, and I came up with a different way to diagnose it.

The problem, in my case, was that a header file was being generated by build step A, and used by build step B, but there was no explicit dependency written into build.ninja that forced step A to happen before B. So the build failed, the day ninja happened to schedule those steps in the wrong order in a clean build directory.

But the nice thing about this kind of makefile bug is that you can detect it after the fact, because the header dependencies that ninja collected during the build show up the problem. Even if your build happens not to fail, you can look at ninja -t deps after it completes and know that build step B required file foo.h, and you can look at ninja -t graph and know that foo.h was generated by build step A and also that there's no path of graph edges between those two build steps. That's enough to reliably diagnose a makefile bug without having to test-build in random orders at all.

I actually wrote a script to do this analysis, by running both ninja -t deps and ninja -t graph and looking for that kind of inconsistency between them. But I wonder if it would be nicer to have that functionality built into ninja itself? Either as a new tool (ninja -t find-missing-deps?), or as a warning printed automatically at the end of the build?

@Timmmm
Copy link
Author

Timmmm commented Jul 22, 2020

@sgtatham Great idea! I didn't know that Ninja records header dependencies during the build. Do you mind sharing your script?

Also if I'm understanding you, that only works for headers, right?

@sgtatham
Copy link

Sharing my script: sure. I developed it in order to locate a bug of this kind in the LLVM project, so once I'd found the bug, I checked in the script to the LLVM repo. It's here: https://github.com/llvm/llvm-project/blob/master/llvm/utils/check_ninja_deps.py

This should work for any dependencies discovered at build time and recorded via ninja's deps / depfile mechanism. That's usually C/C++ header files (in LLVM, the point is that some header files are autogenerated in a prior build step by their 'Tablegen' tool), but you can reuse the same mechanism for other dynamic dependencies if you're prepared to manually write out data in one of the same formats that Ninja can consume.

@stellaraccident
Copy link

Just found this script that @sgtatham reported and it pinpointed issues that had been non-deterministically plaguing us for a while. FYI, it is now here: https://github.com/llvm/llvm-project/blob/main/llvm/utils/check_ninja_deps.py (branch renamed)

It would be really a nice value add if ninja grew such a feature natively, imo.

@ilor
Copy link
Contributor

ilor commented Oct 21, 2023

Hi all, I'm a bit late in spotting this issue, but ninja 1.11+ now has a -t missingdeps tool that pretty much does what's described by @sgtatham to detect this class of dependency issues -- it looks like we came up with the same general idea :). The main advantage over check_ninja_deps.py (other than availability as a built-in tool) is that the python script seems to not scale well on large builds: I was unable to run it on a Chromium build (ran out of memory after eating 90G), whereas the ninja tool completes in around a minute.

I think this can be closed as resolved by #1331 and released in 1.11.

@Timmmm Timmmm closed this as completed Feb 1, 2024
@Timmmm
Copy link
Author

Timmmm commented Feb 1, 2024

Actually I don't think this is really solved. Looks like missingdeps depends on depfiles which you will only have in certain cases (e.g. header dependencies for C/C++). There are a gazillion other kinds of dependencies that don't come in a handy list. That's kind of the point of this issue.

I previously suggested:

Something like --build_order_seed=45.

It turns out GNU Make actually has this option (--shuffle=45). I think it would be great to have in Ninja too (or do you want to admit that Make is better? :-D)

@Timmmm Timmmm reopened this Feb 1, 2024
@ilor
Copy link
Contributor

ilor commented Feb 1, 2024

Yes, missingdeps relies on depfiles (more generally on input information on build steps/edges). I would argue that if build steps don't have correct input information, the build is already broken and will never be reliable, incrementally in particular. No depfiles and insuficient input info means build steps aren't re-run when they should. Personally I don't think it's useful to optimize tooling for cases where this is not getting fixed. But I'm just a drive-by contributor here ;)

This is somewhat skewed from a Chromium/GN perspective, where all custom actions (roughly: go through Python wrappers and either declare all their inputs statically, or generate depfiles based on what they did. It turns out it's usually easy to get all the used files listed.

@Timmmm
Copy link
Author

Timmmm commented Feb 1, 2024

I would argue that if build steps don't have correct input information, the build is already broken and will never be reliable, incrementally in particular.

Of course. The whole point of this issue is that it's difficult to get dependencies correct and it would be nice to have some kind of tool assistance for doing that. Make's --shuffle is one such tool.

Obviously the proper fix is to use Bazel (or Buck, Pants, Please, Nix, etc) which was created to solve this exact problem (cache invalidation essentially). But "everyone must use Bazel" isn't a very practical answer, so it would be good if people using Ninja had something to help rather than nothing (or something that only works for C headers).

@ilor
Copy link
Contributor

ilor commented Feb 1, 2024

It's not only for C headers. Arbitrary build steps can generate depfiles too, and arguably should if they can't list all inputs separately. And I think that if a build has dep issues, fixing input info is the first step.

@Timmmm
Copy link
Author

Timmmm commented Feb 2, 2024

They can generate a depfile, but not a reliable depfile. You said some of your build steps declare all their inputs statically. How do you know you didn't make a mistake and miss a dependency when you wrote that list? That's what this issue is about preventing.

@digit-google
Copy link
Contributor

digit-google commented Feb 7, 2024

I certainly admit that Make is much better than Ninja at certain things. For example, Make supports user-defined functions, while Ninja does not, very intentionally :-)

Apart from that, I would prefer to make Ninja's behavior as deterministic as possible, as we already suffer from enough cases where its incremental builds are flakily broken (which is a risk for any non-trivial project after a regeneration step). Introducing even more randomness to help fix broken build plan generators does not seem in line with Ninja's design goals.

@Timmmm
Copy link
Author

Timmmm commented Feb 7, 2024

we already suffer from enough cases where its incremental builds are flakily broken

Right! That's exactly what this kind of option is meant to help diagnose. There's also no need for it to introduce indeterminacy. You can use a seed (the Make option supports this). Though commands aren't run in a deterministic order in the first place are they?

I think there might be some miscommunication about what this feature does - I can't imagine you would want to deny debugging tools to your users (you already have the -d flag which goes a lot further than this).

I can't imagine it would be very hard to implement, but tbh I've moved companies since I had to deal with the particular dependency graph bug that drove me to open this issue....

Also given that Make has the equivalent feature and it was a CMake project I'm not sure why I didn't just switch generator. Maybe that Make option didn't exist at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants