Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm: build with PGO #79454

Closed
wants to merge 1 commit into from
Closed

llvm: build with PGO #79454

wants to merge 1 commit into from

Conversation

carlocab
Copy link
Member

  • Have you followed the guidelines for contributing?
  • Have you ensured that your commits follow the commit style guide?
  • Have you checked that there aren't other open pull requests for the same formula update/change?
  • Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
  • Does your build pass brew audit --strict <formula> (after doing brew install --build-from-source <formula>)? If this is a new formula, does it pass brew audit --new <formula>?

This builds LLVM with profile-guided optimisations. I've adapted the
build from upstream documentation. [1, 2]

This does mean that builds will now take 2-3x longer than before. LLVM
currently takes about four hours to build at most (including recursive
dependents), so the additional time isn't as bad as some other formulae
that require long CI runs. Moreover, the collective time this saves
users of the LLVM formula should make the additional build time worth
it.

Enabling LTO is another potential optimisation that I haven't enabled
here. This appears to be enabled in Apple's default build. [3] One
reason enabling this is complicated is that enabling LTO generates
bitcode instead of object files, which libtool will refuse to archive
into a static library. It might be possible to build all the static
libraries first and then build optimised binaries later with LTO, but
I'll leave that for another PR (if at all).

See #77975.

[1] https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo
[2] https://github.com/llvm/llvm-project/blob/33ba8bd2/llvm/utils/collect_and_build_with_pgo.py
[3] https://github.com/apple/llvm-project/blob/swift-5.4-RELEASE/clang/cmake/caches/Apple-stage2.cmake#L30

@BrewTestBot BrewTestBot added the python Python use is a significant feature of the PR or issue label Jun 16, 2021
@carlocab
Copy link
Member Author

So it looks like the build fails everywhere except where I tested it. Lovely. 😅

Formula/llvm.rb Outdated
# Our just-built Clang needs a little help finding C++ headers
on_macos do
if MacOS.version <= :catalina && sdk
cxxflags << "-isystem#{MacOS::CLT::PKG_PATH}/usr/include/c++/v1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because, on Catalina or earlier, the atomic and type_traits headers are not in the SDK, but at /Library/Developer/CommandLineTools/usr/include/c++/v1. We see this on Mojave:

Mojave:CommandLineTools brew$ pwd
/Library/Developer/CommandLineTools
Mojave:CommandLineTools brew$ find . -name 'atomic' -print
./usr/include/c++/v1/atomic
Mojave:CommandLineTools brew$ find . -name 'type_traits' -print
./usr/include/c++/v1/experimental/type_traits
./usr/include/c++/v1/type_traits

On the other hand, on Big Sur:

❯ pwd
/Library/Developer/CommandLineTools
❯ fd '^atomic$'
usr/include/c++/v1/atomic
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/atomic
❯ fd '^type_traits$'
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/experimental/type_traits
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/type_traits
usr/include/c++/v1/type_traits
usr/include/c++/v1/experimental/type_traits

Formula/llvm.rb Outdated
]
runtimes = %w[
compiler-rt
libcxx
libcxxabi
libunwind
openmp
Copy link
Member Author

@carlocab carlocab Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't get openmp to build on ARM while in the projects array. It doesn't seem like this should have negative consequences on macOS... We may want to keep it in the projects array on Linux.

Copy link
Member Author

@carlocab carlocab Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, never mind. It shouldn't matter on Linux either, as it gets built with Clang whether it's in projects or runtimes. It's just a difference of which Clang you use to build it: the throwaway one, or the one that eventually gets installed into the prefix.

Copy link
Member Author

@carlocab carlocab Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the change is due to my limited choice of LLVM_TARGETS_TO_BUILD for bootstrapping since the build fails with a complaint about missing symbols for arm64 on ARM.

In particular, it looks like the openmp build needs to generate non-native code, so it doesn't quite work when you have LLVM_TARGETS_TO_BUILD=Native. This is why building it with the compiler we plan to install works.

Comment on lines +213 to +235
instrumented_cflags = cflags + ["-Xclang -mllvm -Xclang -vp-counters-per-site=6"]
instrumented_cxxflags = cxxflags + ["-Xclang -mllvm -Xclang -vp-counters-per-site=6"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the build produces many warnings of the form

LLVM Profile Warning: Unable to track new values: Running out of static counters.  Consider using option -mllvm -vp-counters-per-site=<n> to allocate more value profile counters at compile time.

It is my understanding that when this occurs there is no benefit to generating additional profile data.

@carlocab
Copy link
Member Author

carlocab commented Jun 26, 2021

An additional benchmark at building clang from

❯ git rev-parse HEAD
4506f614cb6983a16d117cf77a968608e66d7a5c

Using Apple Clang:

❯ cmake ../llvm -DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DLLVM_TARGETS_TO_BUILD=Native -DLLVM_ENABLE_PROJECTS=clang
[snip]
❯ time ( cmake --build . --target clang --parallel 5; )
[snip]
( cmake --build . --target clang --parallel 5; )  15590.55s user 922.92s system 301% cpu 1:31:18.66 total

Using bottled LLVM Clang:

❯ cmake ../llvm -DCMAKE_C_COMPILER=/Users/carlocab/homebrew/bin/clang -DCMAKE_CXX_COMPILER=/Users/carlocab/homebrew/bin/clang++ -DLLVM_TARGETS_TO_BUILD=Native -DLLVM_ENABLE_PROJECTS=clang
[snip]
❯ time ( cmake --build . --target clang --parallel 5; )
[snip]
( cmake --build . --target clang --parallel 5; )  19035.14s user 921.20s system 295% cpu 1:52:31.98 total

Using LLVM with PGO:

❯ cmake ../llvm -DCMAKE_C_COMPILER=$(brew --prefix llvm)/bin/clang -DCMAKE_CXX_COMPILER=$(brew --prefix llvm)/bin/clang++ -DLLVM_TARGETS_TO_BUILD=Native -DLLVM_ENABLE_PROJECTS=clang
[snip]
❯ time ( cmake --build . --target clang --parallel 5; )
[snip]
( cmake --build . --target clang --parallel 5; )  15366.74s user 886.66s system 292% cpu 1:32:37.65 total

The PGO build is still a bit slower than Apple's build, but I think that's because Apple enables LTO, which is a bit more challenging to do for our build.

Or, for Python3:

❯ git rev-parse HEAD
521ba8892ef367c45bf1647b04a726d3f553637c

❯ time ( CC=/usr/bin/clang ../configure && make ) # Apple Clang
( CC=/usr/bin/clang ../configure && make; )  177.68s user 41.80s system 89% cpu 4:06.42 total

❯ time ( CC=$(brew --prefix llvm)/bin/clang ../configure && make ) # PGOed LLVM
( CC=$(brew --prefix llvm)/bin/clang ../configure && make ; )  166.23s user 51.13s system 88% cpu 4:06.00 total

❯ time ( CC=/Users/carlocab/homebrew/bin/clang ../configure && make ) # LLVM bottle
( CC=/Users/carlocab/homebrew/bin/clang ../configure && make; )  195.33s user 47.49s system 90% cpu 4:27.00 total

@carlocab carlocab force-pushed the llvm-pgo branch 2 times, most recently from 4b43252 to abb94f7 Compare June 28, 2021 00:56
@carlocab carlocab mentioned this pull request Jun 30, 2021
@carlocab carlocab added the in progress Stale bot should stay away label Jul 4, 2021
@carlocab carlocab force-pushed the llvm-pgo branch 2 times, most recently from 49e71bb to e422ed8 Compare July 4, 2021 18:35
This builds LLVM with profile-guided optimisations. I've adapted the
build from upstream documentation. [1, 2]

This does mean that builds will now take 2-3x longer than before. LLVM
currently takes about four hours to build at most (including recursive
dependents), so the additional time isn't as bad as some other formulae
that require long CI runs. Moreover, the collective time this saves
users of the LLVM formula should make the additional build time worth
it.

LTO is another potential optimisation that I haven't enabled here.
This appears to be enabled in Apple's default build [3], but is a little
complicated to implement for an LLVM distribution that includes static
archives [4].

Closes Homebrew#77975

[1] https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo
[2] https://github.com/llvm/llvm-project/blob/33ba8bd2/llvm/utils/collect_and_build_with_pgo.py
[3] https://github.com/apple/llvm-project/blob/swift-5.4-RELEASE/clang/cmake/caches/Apple-stage2.cmake#L30
[4] https://llvm.org/docs/BuildingADistribution.html#options-for-optimizing-llvm
@carlocab
Copy link
Member Author

carlocab commented Jul 5, 2021

All my testing shows that this leads to modest to significant performance gains, and should cause no regressions for any users. Merging this -- it should be easy to revert if we find that the added CI time is just too painful.

@BrewTestBot
Copy link
Member

:shipit: @carlocab has triggered a merge.

@github-actions github-actions bot added the outdated PR was locked due to age label Aug 5, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
@carlocab carlocab deleted the llvm-pgo branch September 17, 2022 15:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in progress Stale bot should stay away outdated PR was locked due to age python Python use is a significant feature of the PR or issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants