-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for unified builds to drastically speed up compilation time #177
Conversation
Not trying to block that pull request per se as I don't contribute to this repo, but are you all sure a unity build is a desired thing? |
Thanks for the feedback, Antonin. I read through the blog article you linked, and while there are some insights there, I think that as a commentary on unified compilation it's a bit of a straw man. The criticisms are focused on a naive implementation of unified compilation. If you simply combine every .cpp file in the entire project into a single translation unit, he's correct that you'll encounter a lot of downsides. However, that's not what is generally done in practice, and that's not what the code in this PR does. One of the primary differences between the system that blog article discussed and this PR is that this PR only combines source files which are in the same module. What "module" means is determined by the automake rules - i.e., As far as the tradeoffs with parallelism and incremental builds, there is more detailed discussion in the original comment, but the short version is that individual modules are automatically split into several smaller translation units if they become too large. This is another difference between what this PR does and the approach discussed in the blog post. As mentioned in the original comment, the threshold for "too large" is a value that has to be tuned, but making a change is as easy as changing one number in Regarding precompiled headers, in some ways you can think of them as behaving like unified compilation, but for headers rather than source files. You run into some of the same issues; for example, they can massively worsen incremental build performance if you tune them wrong, because they cause source files to depend on headers that they don't actually include. They're also a bit wonky and compiler specific. In practice my experience has been that you get the same or better benefits with unified compilation with a lot less manual tuning and pain. YMMV, of course. You also mentioned Mozilla uses unified compilation heavily and at scale, and my experience has been that there are few downsides in practice, and the productivity gains offered by faster builds are massive. As the numbers in the original comment show, I saw a 68% reduction in build times with this PR applied. An improvement of that magnitude in the edit-compile-test cycle for every member of the team is nothing to sniff at, and I think that should be kept in mind when weighing the arguments here. One reason I wanted to get this done immediately, before I started writing a lot of p4c code, is because I felt this would have more long term impact than almost any other single improvement I could make. One thing I agree with in that blog article is that there is no substitute for fixing issues in the source code itself. Changes like improving the code's modularity and splitting headers up so that files include no more than what they need are really important. They can make a night-and-day difference in compilation times. That's something that we should definitely keep in mind as we continue to work on p4c. |
for file in file_set: | ||
# We can't handle expansions ('$FOO or $(foo)') in 'foo_UNIFIED', | ||
# because we're not actually evaluating the makefile, so report an error | ||
# if we find anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/anyway/any. I'll fix that before merging.
I'll guess we'll have to agree to disagree. Changing the language semantics by merging multiple translation units is a big deal for me. Whether these files are in the same automake "module" is irrelevant to me. I do agree that productivity gain is important though :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is that as written it will break the LOG code that depends on a single static _file_log_level
varaible per source file that holds the logging detail level for that source file. This could easily be fixed (as I note below).
Also required are corresponding changes to Makefile.am in the p4c-extension-tofino repo.
I'm also a little concerned with this slowing down incremental builds of 1 or 2 source files (the common case for development) in order to speed up the rarer full rebuilds. And while faster builds are nice, the test suite takes longer than a rebuild, so this is attacking only a small part of downtime. Overall I expect time savings to be a wash.
# generates on stdout: | ||
# #include "foo.cpp" | ||
# #include "bar.cpp" | ||
# #include "baz.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to not break the per source file logging code, we could instead generate this as
#define LOG_CONTROL_VAR _file_log_level_foo
static int _file_log_level_foo __attribute((unused)) = -1;
#include "foo.cpp"
#undef LOG_CONTROL_VAR
for each source file and change lib/log.h slightly to make use of LOG_CONTROL_VAR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed IRL, this is being handled in PR #180, which we'll land before this one.
Typically if you're touching header files, and not just source files, I'd expect the time savings to be quite significant. It's certainly something that needs to be tuned though. In the limit, N=1 will reproduce the current situation, but as Antonin suggested we can use a very large N for builds on the continuous integration infrastructure, which would all be full builds, so we'll definitely get a win there at a minimum. That's a good point about the log level. I'll get that fixed. |
I totally agree about the test suite, by the way. I'll investigate improving the test suite speed as well. |
OK, I've added a new commit that should fix the build with p4c-extension-tofino. The problem is that p4c-extension-tofino adds new source files to The new commit adds a new variable to Makefile.am, That's all that's needed to fix the build. After seeing how this is being used, I decided that I think we're good to go now. We'll have to merge the corresponding PR in the other repo at the same time as this one. |
FYI: I've written the necessary patches to make p4c-extension-tofino build with unified compilation as well. (It's really easy once this PR is in place.) Here are some numbers for building p4c-extension-tofino and p4c together, as you'd do during a normal development workflow:
As you can see, that's an improvement of 72% in terms of wall clock time. Nice. |
I'm just going to give a high level description in this PR because Chris and I have talked about this IRL and there is pretty good documentation in the code.
These patches modify p4c's build system to support unified compilation.
bootstrap.sh
now runsgen-unified-makefile.py
, which walks over the source tree, reads the existing automake files, and generates a new automake file which is included inMakefile.am
. The new automake file has rules that generate unified .cpp files as part of the build process and hook up the dependencies correctly. Please take a look at the comments at the top ofgen-unified-makefile.py
for documentation.To make this work, some changes are needed to the existing automake files. The high level idea is that
gen-unified-makefile.py
takes care of generating foo_SOURCES variables. Instead of adding files to foo_SOURCES, automake files should add them to either foo_UNIFIED or foo_NONUNIFIED. Files in foo_UNIFIED are subject to unified compilation; files in foo_NONUNIFIED will be compiled independently, as before. Yacc files, for example, have to go in foo_NONUNIFIED.gen-unified-makefile.py
collects all of the foo_UNIFIED and foo_NONUNIFIED variables it encounters and generates foo_SOURCES definitions which are fed into automake to produce the final makefile. Again, there's more documentation in the tool. There's a fair amount of error checking to make it hard to mess things up.The generated .cpp files are split into chunks, each containing up to N files. N is a tunable parameter; there's a tradeoff between single-threaded compilation time (for which you want a very large N) and better parallelism / incremental builds (for which you want a smaller N). If N gets too small, you start to lose the benefits of unified compilation, so balancing these factors is tricky and is best done experimentally. The default value I've chosen of '10' is a good compromise based on the experiments I've conducted; over time we can re-tune if necessary.
Because unified builds effectively merge multiple independent source files into one, you can sometimes run into problems with name clashes when multiple files use the same names for
static
declarations or declarations in anonymous namespaces, since those declarations are normally restricted to a single translation unit. P4c only hit this issue in one case: parserUnroll.cpp has an anonymous namespace that contains some declarations that clash with other declarations elsewhere in the project. I've fixed this by converting the anonymous namespace into a named namespace.One other aside: there's no benefit to including the headers in foo_SOURCES. Since automake infers dependencies automatically, that was never needed for dependency reasons; it's recommended only so that if you use e.g.
make dist
to create a distribution tarball, the headers will be included. Taking this into account, I removed the headers when I replaced foo_SOURCES with foo_UNIFIED, because including them just means I have to filter them out immediately ingen-unified-makefile.py
to avoid producing spurious dependencies in the generated automake file. I could do that filtering (automake, after all, does so when building), but I think it's better to just not include the headers in the first place so as to not mislead people into thinking that including them serves any function. This also brings us in line with most other build systems I've used.So after my efforts to improve compilation performance this week, where does this leave us? Going through my compilation performance patch stack (and apologies for the lack of decent formatting):
Original state, -j8: 142.03 real 951.23 user 26.34 sys
-O0 patch, -j8: 113.92 real 787.10 user 26.57 sys
Irgenerator patch, -j8: 108.66 real 773.05 user 26.30 sys
Unified compilation patch, -j8: 35.11 real 203.78 user 7.83 sys
So we go from 142 to 35 seconds. Pretty good!
I think we can actually substantially improve on this when we have time to circle around and look at compilation performance again. Much of the remaining overhead seems to come from template instantiations, and some judicious code rearrangement is likely to eliminate a lot of that cost. Even right now it's very easy, by setting the chunk size very large in
bootstrap.sh
, to compile the entire project in 30 seconds. Merging every file in the project into a single unified .cpp file got me closer to 20 seconds, so consider that a limit study of sorts.If you want to test locally, remember that you need to destroy your build directory and rerun
bootstrap.sh
.