Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cc-wrapper: improve response file parsing speed #26974

Merged

Conversation

ryantrinkle
Copy link
Contributor

@ryantrinkle ryantrinkle commented Jun 29, 2017

Motivation for this change

Although #25205 improved the correctness of response file handling in cc-wrapper, it also devastated performance for large response files, e.g. building Haskell packages with many dependencies. This commit replaces the bash-based parser with one written in C. Although bootstrapping is somewhat more complex with this approach, the new code is ~75,000x faster for my test case.

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

CC @orivej

@mention-bot
Copy link

@ryantrinkle, thanks for your PR! By analyzing the history of the files in this pull request, we identified @orivej, @edolstra, @pikajude, @LnL7 and @copumpkin to be potential reviewers.

@Ericson2314
Copy link
Member

Ericson2314 commented Jun 29, 2017

Ah, the merge conflict is because @orivej also pushed a performance improvement to staging. This should still make it quite a bit faster than that too, though.

@pbogdan
Copy link
Member

pbogdan commented Jun 29, 2017

FYI I believe this might be the PR @Ericson2314 was talking about #26554

@Ericson2314 Ericson2314 force-pushed the response-file-parsing-speed branch from d2b3480 to 2dd4760 Compare June 29, 2017 22:06
@Ericson2314 Ericson2314 requested a review from domenkozar June 29, 2017 22:25
@Ericson2314
Copy link
Member

A question to decide on is whether we want to keep around the bash version as a fall back, rather than breaking. The error message says "not allowed when bootstrapping", but then again it's actually quite early in the bootstrapping process that this is allowed.

@orivej
Copy link
Contributor

orivej commented Jun 30, 2017

If bash parser in staging needs to be replaced, I published a simpler C++ one at https://github.com/orivej/expand-compiler-args

@ryantrinkle ryantrinkle force-pushed the response-file-parsing-speed branch from 2dd4760 to d07f30f Compare June 30, 2017 19:21
@ryantrinkle
Copy link
Contributor Author

@orivej I have no complaint if you'd rather use your code. My goal was just to solve the issue while keeping the code, dependencies, and bootstrapping logic as simple as possible. I'm not familiar enough with C++'s toolchain to know whether it would present any difficulties. I did have to restrain myself from trying to do it in Haskell, though—attoparsec would've made this quite a bit nicer!

BTW, the file I've been testing with is responsefile.txt.

@orivej
Copy link
Contributor

orivej commented Jun 30, 2017

I had not considered that C++ toolchain is not available as soon as C toolchain is. I appreciate the work you did to integrate compiled rsp parser into bootstrap process, thank you! Yet parseResponseFile.c and expandResponseParams() could be much simpler, so I suggest that you replace them with my implementation, which is now in C.

@0xABAB
Copy link
Contributor

0xABAB commented Jun 30, 2017

I preferred the C++ version, assuming it had the same semantics.

@orivej
Copy link
Contributor

orivej commented Jul 1, 2017

If C++ is an option for early Nix bootstrap, I made updated version available at the cpp branch. The only semantical change was about handling incorrect .rsp files as gcc, e.g. normal arg and 'unclosed arg with trailing\ is now parsed as normal arg and 'unclosed arg with trailing'.

Current timings of parsing responsefile.txt with C, staging, and master versions of expandResponseParams are: 0m0.013s, 0m2.110s, and 3m25.268s. (Actual C parser completes under 2ms, the rest must be Bash overhead.)

@orivej
Copy link
Contributor

orivej commented Jul 1, 2017

Alternatively, bash version could be optimized somewhat further:
orivej/expand-compiler-args@d77495a reduces responsefile.txt parsing time from 2s to 1.2s, ghc_7.rsp from 1s to 0.6s.

@0xABAB
Copy link
Contributor

0xABAB commented Jul 1, 2017

Can we get anything which improves the situation now as opposed to what might be nice to have some later date in the future? The initial 75,000 factor of improvement seems to be interesting enough to include regardless of whether it could be made faster in another PR.

At least that's the process I would like to see.

@orivej
Copy link
Contributor

orivej commented Jul 1, 2017

Can we get anything which improves the situation now as opposed to what might be nice to have some later date in the future?

Yes, merge nixpkgs staging branch into master.

The initial 75,000 factor of improvement seems to be interesting enough to include regardless of whether it could be made faster in another PR.

This PR is for the staging branch, as it should be because it causes everything to be rebuilt. Merging it will reduce parsing time of the provided sample from 2s to 0.02s.

@0xABAB
Copy link
Contributor

0xABAB commented Jul 1, 2017

OK, so we are now just waiting for someone to push the merge button then, unless there are more comments from others.

@orivej
Copy link
Contributor

orivej commented Jul 1, 2017

No, merging this PR has nothing to do with merging staging into master. This PR is about speeding up staging.

@0xABAB
Copy link
Contributor

0xABAB commented Jul 1, 2017

@orivej I did not say there was, since the top says ryantrinkle wants to merge 1 commit into NixOS:staging from obsidiansystems:response-file-parsing-speed. I am well aware as to what the role is of staging.

@orivej
Copy link
Contributor

orivej commented Jul 1, 2017

I'd wait for @ryantrinkle on
obsidiansystems#3
and for people here to say if they want this additional speedup or if they do not want this additional complexity. For my purposes current staging is good enough.

@peti peti requested a review from vcunat July 1, 2017 19:22
@peti peti added 0.kind: enhancement Add something new 1.severity: mass-rebuild This PR causes a large number of packages to rebuild labels Jul 1, 2017
Copy link
Member

@peti peti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of merging this change. I don't fully understand the implications of adding a C program into the cc wrapper, but intuitively I'd assume that this is no problem. It would be nice to hear what @vcunat thinks about that issue.

name = "parse-response-file";
src = ./parseResponseFile.c;
buildCommand = ''
# Make sure the output file doesn't refer to the input nix path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Why not "$CC" -o "$out" "$src"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got a runtime dependency on the source file, probably because __FILE__ in the assert expansion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's plausible, but why is it undesirable?

@Ericson2314
Copy link
Member

I think C++ should be fine. Also, if I recall correctly, there is some langCC attribute or similar on compilers that can be used to tell whether the compiler is supposed to support it.

@orivej
Copy link
Contributor

orivej commented Jul 4, 2017

This PR is ready for merge.

@domenkozar
Copy link
Member

Let's wait on approval from @Ericson2314 and @ryantrinkle and merge :)

@ryantrinkle
Copy link
Contributor Author

@domenkozar This is good to go.

@ryantrinkle ryantrinkle merged commit 7004641 into NixOS:staging Jul 5, 2017
void append(String *s, const char *data, size_t len) {
resize(s, s->len + len);
memcpy(s->data + s->len - len, data, len);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this custom string type (above) + append for? Why can't this use std::string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got confused that there's a proposed C++ implementation, but this one is not that, this is a C implementation. So nevermind.

@nh2
Copy link
Contributor

nh2 commented Jul 24, 2017

I found that even after this PR merged (nixpkgs commit aad2a21 for me), ld-wrapper invocations can still take 10 seconds of CPU time in bash for each executable for large Haskell projects.

@nh2
Copy link
Contributor

nh2 commented Aug 17, 2017

invocations can still take 10 seconds of CPU time in bash for each executable for large Haskell projects

Fixed in #27657.

@Ericson2314 Ericson2314 deleted the response-file-parsing-speed branch January 23, 2018 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 1.severity: mass-rebuild This PR causes a large number of packages to rebuild
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants