Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around custom section flag regression in LLVM #90326

Closed
nikic opened this issue Oct 26, 2021 · 10 comments
Closed

Work around custom section flag regression in LLVM #90326

nikic opened this issue Oct 26, 2021 · 10 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.

Comments

@nikic
Copy link
Contributor

nikic commented Oct 26, 2021

LLVM 13 has removed support for setting custom section flags in https://reviews.llvm.org/D100944. LLVM 13 carries a revert of the problematic change (https://reviews.llvm.org/D107216). However, LLVM 14 will no longer carry it, and we need to use alternative ways to set custom section flags.

There are currently two places affected by this. The .llvmbc bitcode section created in

unsafe fn embed_bitcode(
and the .rustc metadata section created in
pub fn write_compressed_metadata<'tcx>(
. In both cases, section flags are used to ensure that the section is not linked into the final binary or loaded into memory.

There are two ways to address this. The first one is to define the section entirely in module-level assembly. Rather than only emitting .section, we should also emit .ascii "ENCODED_SECTION_CONTENTS", as well as the symbol name and any other necessary directives (e.g. for visibility). I'm not sure exactly what is needed on different platforms here.

The second one is to use the object crate to create the object file containing the section. The cranelift backend already implements this for the .rustc section in

pub(crate) fn new_metadata_object(
. A nice side benefit of this is that the handling would be codegen-backend agnostic and could be shared between the llvm/cranelift/gcc backends.

The suggested course of action is to use the object crate for .rustc and inline assembly for .llvmbc. The reason for the latter is that the section is part of a larger object file emitted by LLVM, and the object crate doesn't seem great for modifying existing object files (as opposed to creating one from scratch). Though if we can use the object crate for everything, that would of course be great.

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 26, 2021
@durin42
Copy link
Contributor

durin42 commented Oct 27, 2021

Quoting one of my LLVM experts: "the .llvmbc issue is an issue for Clang as well (e.g. using clang -fembed-bitcode on ELF has this problem). Clearly that ought to be solved in some manner other than inline-asm hacks or post-processing object files." So there's some appetite for a more systematic solution upstream in LLVM.

@jyknight
Copy link

(Jumping in from LLVM side -- I was pointed to this bug.)

clang -fembed-bitcode also causes the bitcode to end up in the binary on ELF platforms, just like Rust would without the section-flag workaround. That's perhaps not entirely useful, but the -fembed-bitcode flag is almost unused everywhere except on Apple platforms (where it's required in order to upload to appstore in some cases) -- so basically nobody really cares.

Rust seems to want to go to some trouble to avoid this outcome -- but it's not clear to me why you want the embedded bitcode in the first place -- other than on iOS/apple? Perhaps it'd be nice to fix this someday, but is it more important to Rust, than it is to Clang?

@nikic
Copy link
Contributor Author

nikic commented Oct 27, 2021

@jyknight Under some circumstances, Rust uses fat object files that contain both machine code and bitcode. The MC version is used without LTO, the BC version used with LTO.

Changing the flags that LLVM automatically applies to .llvmbc would solve (the more ugly) half of the problem, but I wasn't sure whether unconditionally using SHF_EXCLUDE/LNK_REMOVE would be fine for other use cases.

@jyknight
Copy link

LTO input wasn't really the intended use-case of the embedded bitcode feature. I don't think that even works at all with clang+lld, does it? The original purpose was an effectively apple-specific use-case: to allow Apple to rerun the codegen phase of compilation separately for each original object file, and then relink the binary. This would be used either in order to workaround CPU bugs with a new version of the toolchain, or to recompile an app to a new specially-designed-to-be-ABI-compatible architecture (bitcode for watchOS on ARM32 can be rebuilt for watchos on AArch64 (the later they call "arm64_32", since it uses a 32bit ABI)).

For LTO, it's better to just emit the correct object file for the linking mode/modes you're actually using. That's why it surprises me that this is something rust is running into.

The overhead of doing a fat object file (both size of the extraneous output and time to generate it) can be substantial, and additionally, the pass pipelines are different for generating pre-LTO bitcode vs pre-object-emission bitcode. (The semantics of llvmbc used by -fembed-bitcode is that it contains the pre-object-emission bitcode, not LTO bitcode).

Simply SHF_EXCLUDE'ing .llvmbc isn't a solution by itself. The primary purpose of -fembed-bitcode is intended to be that you can get the bitcode output in the final binary when you ask for it. On Mach-O platforms, this is done with special linker support: if the linker is passed "-bitcode_bundle", it takes the bitcode sections of the objects, puts them into a weird container format ("XAR"), then puts that into a section in the output binary. If that linker option isn't set, it ignores the bitcode sections in the input object files.

This doesn't really work on non-MachO platforms at the moment -- both because bitcode gets included into the output without a flag, and because when it is emitted, it's just concatenated, rather than put into a useful form.

We could potentially add proper support, but again, I don't think this is actually something Rust should be using at all...

@bjorn3
Copy link
Member

bjorn3 commented Oct 27, 2021

For LTO, it's better to just emit the correct object file for the linking mode/modes you're actually using. That's why it surprises me that this is something rust is running into.

We have to support libraries that are capable of LTO, but don't require it. The same standard library is used for LTO and non-LTO usage and as such needs to contain LTO bitcode, but also machine code at the same time.

@bjorn3
Copy link
Member

bjorn3 commented Oct 27, 2021

Simply SHF_EXCLUDE'ing .llvmbc isn't a solution by itself. The primary purpose of -fembed-bitcode is intended to be that you can get the bitcode output in the final binary when you ask for it.

Would you prefer the section to be called .rust_llvm_lto_bitcode or something like that? We don't use this section for the same purpose as -fembed-bitcode in clang it seems.

@nikic nikic self-assigned this Dec 3, 2021
nikic added a commit to nikic/rust that referenced this issue Dec 8, 2021
In LLVM 14, our current method of setting section flags to avoid
embedding the `.llvmbc` section into final compilation artifacts
will no longer work, see issue rust-lang#90326. The upstream recommendation
is to instead embed the entire bitcode using module-level inline
assembly, which is what this change does.

I've kept the existing code for platforms where we do not need to
set section flags, but possibly we should always be using the
inline asm approach.
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 8, 2021
Use object crate for .rustc metadata generation

We already use the object crate for generating uncompressed .rmeta
metadata object files. This switches the generation of compressed
.rustc object files to use the object crate as well. These have
slightly different requirements in that .rmeta should be completely
excluded from any final compilation artifacts, while .rustc should
be part of shared objects, but not loaded into memory.

The primary motivation for this change is rust-lang#90326: In LLVM 14, the
current way of setting section flags (and in particular, preventing
the setting of SHF_ALLOC) will no longer work. There are other ways
we could work around this, but switching to the object crate seems
like the most elegant, as we already use it for .rmeta, and as it
makes this independent of the codegen backend. In particular, we
don't need separate handling in codegen_llvm and codegen_gcc.
codegen_cranelift should be able to reuse the implementation as
well, though I have omitted that here, as it is not based on
codegen_ssa.

This change mostly extracts the existing code for .rmeta handling
to allow using it for .rustc as well, and adjusts the codegen
infrastructure to handle the metadata object file separately: We
no longer create a backend-specific module for it, and directly
produce the compiled module instead.

This does not `fix` rust-lang#90326 by itself yet, as .llvmbc will need to be
handled separately.

r? `@nagisa`
@bors bors closed this as completed in 9488cac Dec 8, 2021
@cuviper
Copy link
Member

cuviper commented Dec 8, 2021

This was auto-closed by commit text, but it's not actually done yet.

@cuviper cuviper reopened this Dec 8, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 13, 2021
Use module inline assembly to embed bitcode

In LLVM 14, our current method of setting section flags to avoid
embedding the `.llvmbc` section into final compilation artifacts
will no longer work, see issue rust-lang#90326. The upstream recommendation
is to instead embed the entire bitcode using module-level inline
assembly, which is what this change does.

I've kept the existing code for platforms where we do not need to
set section flags, but possibly we should always be using the
inline asm approach (which would have to look a bit different for MachO).

r? `@nagisa`
@nikic
Copy link
Contributor Author

nikic commented Dec 13, 2021

Fixed by #91604 and #91654.

@tstellar
Copy link

For LTO, it's better to just emit the correct object file for the linking mode/modes you're actually using. That's why it surprises me that this is something rust is running into.

We have to support libraries that are capable of LTO, but don't require it. The same standard library is used for LTO and non-LTO usage and as such needs to contain LTO bitcode, but also machine code at the same time.

Is the bitcode from libraries that are compiled with -fembed-bitcode even used for LTO? #74657 (comment) implies that gold and lld both ignore the bitcode, and that the presence of the bitcode triggers a bug in the llvm plugin when linking with bfd. That makes it sound like embedding the bitcode doesn't give rust the feature that it wants.

@nikic
Copy link
Contributor Author

nikic commented Dec 16, 2021

@tstellar Normally rust does LTO as part of the compiler (despite the name). In that case the bitcode is obtained using IRObjectFile::findBitcodeInMemBuffer(), which accepts both plain bitcode and various object file types. That's the kind of LTO I was referring to there.

When linker plugin LTO is enabled, then only bitcode is produced instead:

} else if sess.target.obj_is_bitcode
|| (sess.opts.cg.linker_plugin_lto.enabled() && !no_builtins)
{
// This case is selected if the target uses objects as bitcode, or
// if linker plugin LTO is enabled. In the linker plugin LTO case
// the assumption is that the final link-step will read the bitcode
// and convert it to object code. This may be done by either the
// native linker or rustc itself.
//
// Note, however, that the linker-plugin-lto requested here is
// explicitly ignored for `#![no_builtins]` crates. These crates are
// specifically ignored by rustc's LTO passes and wouldn't work if
// loaded into the linker. These crates define symbols that LLVM
// lowers intrinsics to, and these symbol dependencies aren't known
// until after codegen. As a result any crate marked
// `#![no_builtins]` is assumed to not participate in LTO and
// instead goes on to generate object code.
EmitObj::Bitcode
I think the case you are referring to is the !no_nobuiltins exception there, which seems like the only case that may have embedded bitcode in that case. (Possibly that case should be forcing a plain object file without embedded bitcode if the embedded bitcode causes issues? As the code is written, it's not obvious whether ObjectCode(BitcodeSection::Full) may be returned in that case.)

See also https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/src/test/run-make-fulldeps/cross-lang-lto/Makefile for a test that checks that the linker plugin LTO artifacts are bitcode files and https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/src/test/run-make-fulldeps/cross-lang-lto-clang/Makefile that does an end-to-end linker plugin LTO link and checks that LTO does happen.

bjorn3 pushed a commit to bjorn3/rust that referenced this issue Dec 31, 2021
We already use the object crate for generating uncompressed .rmeta
metadata object files. This switches the generation of compressed
.rustc object files to use the object crate as well. These have
slightly different requirements in that .rmeta should be completely
excluded from any final compilation artifacts, while .rustc should
be part of shared objects, but not loaded into memory.

The primary motivation for this change is rust-lang#90326: In LLVM 14, the
current way of setting section flags (and in particular, preventing
the setting of SHF_ALLOC) will no longer work. There are other ways
we could work around this, but switching to the object crate seems
like the most elegant, as we already use it for .rmeta, and as it
makes this independent of the codegen backend. In particular, we
don't need separate handling in codegen_llvm and codegen_gcc.
codegen_cranelift should be able to reuse the implementation as
well, though I have omitted that here, as it is not based on
codegen_ssa.

This change mostly extracts the existing code for .rmeta handling
to allow using it for .rustc as well, and adjust the codegen
infrastructure to handle the metadata object file separately: We
no longer create a backend-specific module for it, and directly
produce the compiled module instead.

This does not fix rust-lang#90326 by itself yet, as .llvmbc will need to be
handled separately.
bjorn3 pushed a commit to bjorn3/rust that referenced this issue Dec 31, 2021
Use object crate for .rustc metadata generation

We already use the object crate for generating uncompressed .rmeta
metadata object files. This switches the generation of compressed
.rustc object files to use the object crate as well. These have
slightly different requirements in that .rmeta should be completely
excluded from any final compilation artifacts, while .rustc should
be part of shared objects, but not loaded into memory.

The primary motivation for this change is rust-lang#90326: In LLVM 14, the
current way of setting section flags (and in particular, preventing
the setting of SHF_ALLOC) will no longer work. There are other ways
we could work around this, but switching to the object crate seems
like the most elegant, as we already use it for .rmeta, and as it
makes this independent of the codegen backend. In particular, we
don't need separate handling in codegen_llvm and codegen_gcc.
codegen_cranelift should be able to reuse the implementation as
well, though I have omitted that here, as it is not based on
codegen_ssa.

This change mostly extracts the existing code for .rmeta handling
to allow using it for .rustc as well, and adjusts the codegen
infrastructure to handle the metadata object file separately: We
no longer create a backend-specific module for it, and directly
produce the compiled module instead.

This does not `fix` rust-lang#90326 by itself yet, as .llvmbc will need to be
handled separately.

r? `@nagisa`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
Projects
None yet
Development

No branches or pull requests

6 participants