-
-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm: build with PGO #79454
llvm: build with PGO #79454
Conversation
So it looks like the build fails everywhere except where I tested it. Lovely. 😅 |
59b07d3
to
a58c751
Compare
5f8b54d
to
5f696f3
Compare
b437d4a
to
6238d02
Compare
Formula/llvm.rb
Outdated
# Our just-built Clang needs a little help finding C++ headers | ||
on_macos do | ||
if MacOS.version <= :catalina && sdk | ||
cxxflags << "-isystem#{MacOS::CLT::PKG_PATH}/usr/include/c++/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because, on Catalina or earlier, the atomic
and type_traits
headers are not in the SDK, but at /Library/Developer/CommandLineTools/usr/include/c++/v1
. We see this on Mojave:
Mojave:CommandLineTools brew$ pwd
/Library/Developer/CommandLineTools
Mojave:CommandLineTools brew$ find . -name 'atomic' -print
./usr/include/c++/v1/atomic
Mojave:CommandLineTools brew$ find . -name 'type_traits' -print
./usr/include/c++/v1/experimental/type_traits
./usr/include/c++/v1/type_traits
On the other hand, on Big Sur:
❯ pwd
/Library/Developer/CommandLineTools
❯ fd '^atomic$'
usr/include/c++/v1/atomic
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/atomic
❯ fd '^type_traits$'
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/experimental/type_traits
SDKs/MacOSX11.3.sdk/usr/include/c++/v1/type_traits
usr/include/c++/v1/type_traits
usr/include/c++/v1/experimental/type_traits
Formula/llvm.rb
Outdated
] | ||
runtimes = %w[ | ||
compiler-rt | ||
libcxx | ||
libcxxabi | ||
libunwind | ||
openmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't get openmp
to build on ARM while in the projects
array. It doesn't seem like this should have negative consequences on macOS... We may want to keep it in the projects
array on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, never mind. It shouldn't matter on Linux either, as it gets built with Clang whether it's in projects
or runtimes
. It's just a difference of which Clang you use to build it: the throwaway one, or the one that eventually gets installed into the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the change is due to my limited choice of LLVM_TARGETS_TO_BUILD
for bootstrapping since the build fails with a complaint about missing symbols for arm64
on ARM.
In particular, it looks like the openmp
build needs to generate non-native code, so it doesn't quite work when you have LLVM_TARGETS_TO_BUILD=Native
. This is why building it with the compiler we plan to install works.
instrumented_cflags = cflags + ["-Xclang -mllvm -Xclang -vp-counters-per-site=6"] | ||
instrumented_cxxflags = cxxflags + ["-Xclang -mllvm -Xclang -vp-counters-per-site=6"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the build produces many warnings of the form
LLVM Profile Warning: Unable to track new values: Running out of static counters. Consider using option -mllvm -vp-counters-per-site=<n> to allocate more value profile counters at compile time.
It is my understanding that when this occurs there is no benefit to generating additional profile data.
An additional benchmark at building
Using Apple Clang:
Using bottled LLVM Clang:
Using LLVM with PGO:
The PGO build is still a bit slower than Apple's build, but I think that's because Apple enables LTO, which is a bit more challenging to do for our build. Or, for Python3:
|
4b43252
to
abb94f7
Compare
49e71bb
to
e422ed8
Compare
This builds LLVM with profile-guided optimisations. I've adapted the build from upstream documentation. [1, 2] This does mean that builds will now take 2-3x longer than before. LLVM currently takes about four hours to build at most (including recursive dependents), so the additional time isn't as bad as some other formulae that require long CI runs. Moreover, the collective time this saves users of the LLVM formula should make the additional build time worth it. LTO is another potential optimisation that I haven't enabled here. This appears to be enabled in Apple's default build [3], but is a little complicated to implement for an LLVM distribution that includes static archives [4]. Closes Homebrew#77975 [1] https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo [2] https://github.com/llvm/llvm-project/blob/33ba8bd2/llvm/utils/collect_and_build_with_pgo.py [3] https://github.com/apple/llvm-project/blob/swift-5.4-RELEASE/clang/cmake/caches/Apple-stage2.cmake#L30 [4] https://llvm.org/docs/BuildingADistribution.html#options-for-optimizing-llvm
All my testing shows that this leads to modest to significant performance gains, and should cause no regressions for any users. Merging this -- it should be easy to revert if we find that the added CI time is just too painful. |
brew install --build-from-source <formula>
, where<formula>
is the name of the formula you're submitting?brew test <formula>
, where<formula>
is the name of the formula you're submitting?brew audit --strict <formula>
(after doingbrew install --build-from-source <formula>
)? If this is a new formula, does it passbrew audit --new <formula>
?This builds LLVM with profile-guided optimisations. I've adapted the
build from upstream documentation. [1, 2]
This does mean that builds will now take 2-3x longer than before. LLVM
currently takes about four hours to build at most (including recursive
dependents), so the additional time isn't as bad as some other formulae
that require long CI runs. Moreover, the collective time this saves
users of the LLVM formula should make the additional build time worth
it.
Enabling LTO is another potential optimisation that I haven't enabled
here. This appears to be enabled in Apple's default build. [3] One
reason enabling this is complicated is that enabling LTO generates
bitcode instead of object files, which
libtool
will refuse to archiveinto a static library. It might be possible to build all the static
libraries first and then build optimised binaries later with LTO, but
I'll leave that for another PR (if at all).
See #77975.
[1] https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo
[2] https://github.com/llvm/llvm-project/blob/33ba8bd2/llvm/utils/collect_and_build_with_pgo.py
[3] https://github.com/apple/llvm-project/blob/swift-5.4-RELEASE/clang/cmake/caches/Apple-stage2.cmake#L30