Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ninja -t deps performance regression - now 10x slower on Windows, due to ec8de9c247 #2018

Closed
randomascii opened this issue Aug 24, 2021 · 9 comments
Labels
Milestone

Comments

@randomascii
Copy link
Contributor

While working on Chromium dependency reductions I have been using "ninja -t deps > deps.txt" on Windows and I have also been testing newer versions of ninja. I noticed that these two went together badly - newer versions of ninja run this command unusably slowly.

Profiling showed that newer versions of ninja call WriteFile for every character, thus explaining the slowdown. Roughly 90% of the -t deps CPU time ends up being in the OS WriteFile function. The call stack (with some CRT frames omitted) is:

ninja.exe!main
ninja.exe!anonymous namespace'::real_main ninja.exe!anonymous namespace'::NinjaMain::ToolDeps
ninja.exe!printf
ninja.exe!__stdio_common_vfprintf
...
ninja.exe!__acrt_stdio_flush_and_write_narrow_nolock
ninja.exe!write_buffer_nolock
ninja.exe!_write
ninja.exe!_write_nolock
ninja.exe!write_text_ansi_nolock
KernelBase.dll!WriteFile

The call to printf presumably hasn't changed, but the caching behavior in the CRT has. Bisecting shows conclusively that the problem is this change:

commit ec8de9c (HEAD)
Merge: c83f00c 00459e2
Author: Jan Niklas Hasse jhasse@bixense.com
Date: Tue Feb 23 10:00:23 2021 +0100

Merge pull request #1915 from jhasse/windows-utf8

Use UTF-8 on Windows 10 Version 1903, fix #1195
@randomascii
Copy link
Contributor Author

@jhasse - FYI.

@randomascii
Copy link
Contributor Author

I did some investigating to understand the bug. The problem is that StatusPrinter contains a LinePrinter and LinePrinter's constructor always calls setvbuf(stdout, NULL, _IONBF, 0);, which disables buffering. This means that a WriteFile call is needed for every character, instead of every ~512 characters.

So, either "Status* status = new StatusPrinter(config);" should only be executed when a StatusPrinter/LinePrinter is needed and will be used, and when the single-character WriteFile calls are acceptable, or else LinePrinter needs to only disable buffering when it is asked to, instead of when it is constructed. I don't know which solution is better, but "-t deps" and probably some other features are basically unusable until this is fixed.

@jhasse
Copy link
Collaborator

jhasse commented Aug 25, 2021

Thanks for investigating! I knew Microsoft manages to mess this up somehow.

Why does setvbuf(stdout, NULL, _IONBF, 0); not tank performance without UTF-8?

@jhasse jhasse added the windows label Aug 25, 2021
@jhasse jhasse added this to the 1.11.0 milestone Aug 25, 2021
@randomascii
Copy link
Contributor Author

Why does setvbuf(stdout, NULL, _IONBF, 0); not tank performance without UTF-8?

It does. UTF-8 is unrelated to this issue. The only thing that matters is calling setvbuf(stdout, NULL, _IONBF, 0);, which happens when a StatusPrinter object is constructed, because the LinePrinter constructor calls it. Thus, the LinePrinter object has a significant side effect, and that is what triggered this latent performance bug.

I confirmed this by checking out 5392e0e (before the UTF-8) change and adding a call to setvbuf(stdout, NULL, _IONBF, 0); in main. This tanked performance.

I also checked out the head of the tree (slow -t deps performance) and then deleted the setvbuf(stdout, NULL, _IONBF, 0); call. This gave me back excellent performance.

The setvbuf(stdout, NULL, _IONBF, 0); call tells the CRT to call WriteFile for every character which is unusably slow for any non-trivial number of characters. We need to not call it when emitting significant text with printf. I'm not sure we need to call it at all, but if we do need to call it we should probably skip it anytime stdout is a file.

The normal status output from the build looks fine with the buffering left enabled, I think because that doesn't go through printf on Windows.

I think we can just delete that line from the LinePrinter constructor, probably.

@randomascii
Copy link
Contributor Author

BTW, I looked to see when the call to disable printf buffering was added. It first showed up in December 2011 (!!!) in change 0742527. It got moved around a few times (10d1c93, then a8bc2e1), but managed to avoid causing -t deps problems until this year, presumably because it wasn't executed in the -t deps case.

I'm still not clear on what it actually does. It looks like the printing of the status updates during a build is done doing Win32 console functions rather than printf so disabling of buffering seems unnecessary. I think it is possible that the problematic setvbuf call can just be removed.

@jhasse
Copy link
Collaborator

jhasse commented Aug 27, 2021

Why did the UTF-8 commit turn up in bisecting then?

@randomascii
Copy link
Contributor Author

The UTF-8 merge altered the code flow so that the existing setvbuf() call that disabled buffering was run always, instead of just in some cases.

My bisect was a bit misleading (branch merges in git history confuse me - in Chromium we use them rarely so I have managed to maintain ignorance) and therefore my bisect was, I guess, off by a couple of commits. Redoing my analysis, here's the relevant history:

ec8de9c Merge pull request #1915 from jhasse/windows-utf8
c83f00c (HEAD) Merge pull request #1911 from eszlari/cmake-cleanup
e77c561 Merge pull request #1899 from colincross/status_for_serialize - slow
00459e2 Use UTF-8 on Windows 10 Version 1903, fix #1195 - fast

I think the ec8de9c commit includes the three subsequent commits in some contexts and that is what threw me off. Anyway, e77c561 is the real culprit. Specifically, it creates a StatusPrinter object in main, always, and this turns off buffering, and that is what kills the performance of "-t deps"

StatusPrinter needs (apparently? Maybe not?) buffering disabled, but it disables it globally for its own private needs. That is bad, and that is why this latent bug appeared from a seemingly unrelated change.

I hope it's all clear now.

@randomascii
Copy link
Contributor Author

This issue needs to be fixed (IMHO) before Chromium can move to the latest ninja, which we would like to do, so I am investigating.

I went back to 9c66e69, the change just before the regression. to see when the LinePrinter constructor, which disables buffering is run. If I just run ninja when there is nothing to be done then the constructor is hit twice. Once here:

ninja.exe!LinePrinter::LinePrinter() Line 47	C++
[Inline Frame] ninja.exe!BuildStatus::{ctor}(const BuildConfig &) Line 84	C++
ninja.exe!Builder::Builder(State * state, const BuildConfig & config, BuildLog * build_log, DepsLog * deps_log, DiskInterface * disk_interface) Line 737	C++

ninja.exe!anonymous namespace'::NinjaMain::RebuildManifest(const char * input_file, std::string * err) Line 253 C++ ninja.exe!anonymous namespace'::real_main(int argc, char * * argv) Line 1414 C++

and once here:

ninja.exe!LinePrinter::LinePrinter() Line 47	C++
[Inline Frame] ninja.exe!BuildStatus::{ctor}(const BuildConfig &) Line 84	C++
ninja.exe!Builder::Builder(State * state, const BuildConfig & config, BuildLog * build_log, DepsLog * deps_log, DiskInterface * disk_interface) Line 737	C++
ninja.exe!`anonymous namespace'::NinjaMain::RunBuild(int argc, char * * argv) Line 1203	C++
ninja.exe!`anonymous namespace'::real_main(int argc, char * * argv) Line 1427	C++

In the -t deps case (in the old code) the LinePrinter constructor is never run. The "if (options.tool && options.tool->when == Tool::RUN_AFTER_FLAGS)" check causes it to run the tool and then exit quite early on in real_main.

In the current code the StatusPrinter object is allocated early on in real_main. It is then used in status->Info("Entering directory..., which used to just be a call to Info.

So ultimately that is the problem. Using status->Info requires allocating the StatusPrinter object and that causes buffering to be disabled.

The real issue is that the LinePrinter constructor has a significant side effect that was not well understood. I have a fix which I will upload soon.

randomascii added a commit to randomascii/ninja that referenced this issue Oct 1, 2021
Previously stdout buffering was disabled in the LinePrinter constructor.
This worked for a long time but ultimately this side-effect caused a
performance bug (issue ninja-build#2018) in tools such as -t deps. Moving the
disabling of buffering into real_main and only disabling buffering
when a tool is not used makes the desired semantics clearer and restores
the lost performance.

This fixes issue ninja-build#2018. It has been tested and a 10x speedup was seen
relative to the tip-of-tree version.
randomascii added a commit to randomascii/ninja that referenced this issue Oct 1, 2021
Previously stdout buffering was disabled in the LinePrinter constructor.
This worked for a long time but ultimately this side-effect caused a
performance bug (issue ninja-build#2018) in tools such as -t deps. Moving the
disabling of buffering into real_main and only disabling buffering
when a tool is not used makes the desired semantics clearer and restores
the lost performance.

This fixes issue ninja-build#2018. It has been tested and a 10x speedup was seen
relative to the tip-of-tree version.
@randomascii
Copy link
Contributor Author

#2031 should be a reasonable fix to this issue, if someone can land it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
@jhasse @randomascii and others