-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't leak non-exported symbols from staticlibs #104707
Comments
cc @chorman0773 |
FTR, I'm unsure there is a way to limit exported symbols from a staticlib, without both limiting the number of CGUs within the crate to 1 and preventing any upstream crates from using the |
Maybe partial linking would work? If not we will at least need to version all |
IDK whether all linkers support partial linking (in particular, I don't know about microsoft's link.exe). Though, personally, I don't particularily want to touch
I plan to do it slightly differently when possible - for staticlibs resolve the dependencies and produce a |
That is a backward incompatible change for rustc
That will share and thus leak the global state of libstd across the cdylib boundary. This among other things will break the mitigation of #102721 to prevent catching foreign rust panics. |
On Tue, Nov 22, 2022 at 07:28 bjorn3 ***@***.***> wrote:
for staticlibs resolve the dependencies and produce a links.o "object"
that is just a linker script.
That is a backward incompatible change for rustc
rustc currently just emits the library directly into the archive, right?
I'm not particularly sure what the difference here is, except in terms of
file size, and the fact that using libstd.so.0.1 is an option in addition
to libstd.rlib.0.1.
For cdylibs, just link as normal and when dynamically linking, add
DT_RPATH (to the stdlib directory)+DT_NEEDED as needed.
That will share and thus leak the global state of libstd across the cdylib
boundary.
That is true, though I'm unsure how it is avoidable - even when statically
linking the symbols from libstd et. al need to have default visibility so
that when linking into a dylib, the symbols can be used via the dylib. It
would not make a difference on ELF.
… —
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2667YDJXO3NIQ42553WJS4AXANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The difference is that currently you can ship a staticlib as a standalone file and expect linking to succeed, but with your proposal you also need all dependencies to be available at exactly the same place.
When linking a cdylib, libstd is statically linked and none of it's symbols are exported from the cdylib. When linking a rust dylib, sharing state is just fine. In fact you aren't allowed to duplicate crates in that case. |
Fair enough, though the result is potentially shipping GiB for an API surface that should take MiB.
On ELF I'm unsure how this would be achieved. Symbol visibilty is controlled when the symbol is defined (in the object file), and I just send everything on to ld (post processing? What is that? I only know "add head and tail libraries to link line"). ELF shared objects don't have an "Export List", the dynamic symbol list is just built from the static symbol list usually excluding internal and hidden symbols. Every GLOBAL or WEAK symbol in the list with PROTECTED or DEFAULT visibility can be imported from the cdylib and every symbol with DEFAULT visibilty can additonally be replaced. These are functions of the link editor producing the files and the dynamic linker-loader resolving runtime relocations, and are far from under the control of rust as a language or any particular implementation. |
Rustc passes a version script to the linker specifying exactly which symbols to export and making it to hide everything else. This is but one of the reasons rustc is in change of invoking the linker. |
Ah. My problem is that I have to support link editors that are like "Version Script? What is this? Expected PHDRS, MEMORY, or SECTIONS" |
Linker scripts and versions scripts are different. Linker scripts tell what should be put where in the linked artifact. Version scripts only list which symbols are exported and which aren't amd optionally provide a version for the purpose of symbol versioning. The format of version scripts is trivial in comparison to linker scripts. See rust/compiler/rustc_codegen_ssa/src/back/linker.rs Lines 666 to 724 in b7463e8
|
I am aware of version scripts. The simplicity is not the problem. The problem is if I'm faced with a link editor that doesn't support them, which I cannot always assume. |
Which linker doesn't support it? AFAIK every platform targeted by rustc has a linker supporting them or some other way to hide symbols. |
Hmm... I'm actually not sure. Some quick research on autoconf says only GNU ld + solaris LD (and lld supports it, as will lcld). I guess older platforms may have none of the above, but IDK how old you have to get. I'm sure given enough time I could find a counterexample, but I don't want to look rn. |
Wrt. staticlibs, with the versioned symbols would it be permissible to have same/compatible versions of a rust compiler share things like the global_allocator between compiled staticlibs and final link targets (rather than rejecting with multidef errors)? |
Won't that risk mixing alloc for one allocator with free for another allocator? |
It shouldn't, since by the time any alloc calls have been made, the
allocator symbol would be resolved entirely (in my case, I'd have to
inhibit devirtualization of the global_allocator, but I'm privilaged to be
able to create a way to do that, though marking the symbol as weak would be
sufficient).
…On Tue, Nov 29, 2022 at 06:14 bjorn3 ***@***.***> wrote:
Won't that risk mixing alloc for one allocator with free for another
allocator?
—
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD25TNBPZ3HW6MTIQLX3WKXQSTANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Symbol resolution may choose the __rust_alloc symbol from one allocator shim and the __rust_dealloc symbol from another allocator shim, right? Even if current linkers are likely to choose both from the same allocator shim, there is no guarantee that this will always be the case AFAIK. |
In my case, the allocator provider is a single symbol that is just a
&'static dyn GlobalAlloc.
In rustc's case, linkonce COMDAT groups exist. Put __rust_alloc and
__rust_dealloc in the same linkonce group.
…On Tue, Nov 29, 2022 at 10:31 bjorn3 ***@***.***> wrote:
Symbol resolution may choose the __rust_alloc symbol from one allocator
shim and the __rust_dealloc symbol from another allocator shim, right? Even
if current linkers are likely to choose both from the same allocator shim,
there is no guarantee that this will always be the case AFAIK.
—
Reply to this email directly, view it on GitHub
<#104707 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD24ICR2VWFYGLBU5AE3WKYOW5ANCNFSM6AAAAAASHT3BQM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I just checked if COMDAT is supported for Mach-O. It isn't: https://godbolt.org/z/f5o9Pj6rY
|
I looked a while ago and they should be? It should be possible the way C++ handles replacing |
I couldn't find any references to COMDAT + Mach-O other than that error in the LLVM source code. For |
Maybe, though vtables would also be found in COMDATs. |
Mach-O only supports 252 sections (1 through 253, 0 is for the current image, 254 is for undefined symbols, 255 for the executable image). https://github.com/aidansteele/osx-abi-macho-file-format-reference
Lld seems to ignore S_COALESCED other than in a check if the section is a code section. If S_COALESCED behaved like COMDAT lld ignoring it would be incorrect. What I think is the case is that all weak symbols are put together in a single S_COALESCED section and then the linker coalesces every function individually rather than in a group like with COMDAT. I can't test my theory. If S_COALESCED is treated as COMDAT though, I still don't think it is very realistic to convince LLVM to support it as it would mean you can't have much more than 200 COMDAT groups in a single object file. |
Isn't this per-segment?
In that case, that would mean that the definition of C++ vtables w/o a key function (All virtual functions are inline or inherited) would be invalid, at least under the Itanium C++ ABI which AFAIK apple clang (and clang on darwin) follows (RTTI and VTable definitions provided by the same TU). |
I thought Mach-O object files (before linking) only allowed a single segment. |
Given that text and data are separate segments pre-link, I'd doubt that. |
…bol, r=<try> Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707
Hello, what is the status of this? I ran into this problem recently when trying to link 2 rust static libs into an existing c program. Each static lib has the symbols from the stl causing conflicts. I do not quite understand the solution to this problem you appear to be discussing with linker scripts... Personally I would prefer a solution where rust outputs a static lib that only contains one object file that has all symbols stripped except those that should in fact be exported. I am aware that this would bloat the static libs size somewhat, but at least in my use case I presume that enabling lto during linkage of the final binary program would remove a lot of the fat. I do not care that the static lib itself would be 20mb larger as I do not ship it. The rust targets I use are linux-musl and windows-gnu. PS: |
Nothing's changed. I've got a PR waiting on review to at prevent symbol conflicts between different rustc versions: #127173 But this won't help with symbol conflicts when using the same rustc version.
That was something not directly related to this issue. It is a solution to another issue with staticlibs, but doesn't affect the symbol conflicts.
That is the partial linking option I suggested. It doesn't work on Windows and last time I tried it, on macOS it wasn't really working either. Could be that I did something wrong for macOS though. |
Currently two static libraries generated by a Rust toolchain cannot be linked together in a single binary due to symbol conflicts (see rust-lang/rust#104707). This is a problem for WebAssembly targets, where dynamic linking is not stable yet. To link multiple Rust-originated static libraries together, we need to produce a single static library from an umbrella crate that re-exports everything from its dependencies. This change allows `uniffi_automerge` to be consumed as a crate dependency by the umbrella crate.
Currently two static libraries generated by a Rust toolchain cannot be linked together in a single binary due to symbol conflicts (see rust-lang/rust#104707). This is a problem for WebAssembly targets, where dynamic linking is not stable yet. To link multiple Rust-originated static libraries together, we need to produce a single static library from an umbrella crate that re-exports everything from its dependencies. This change allows `uniffi_automerge` to be consumed as a crate dependency by the umbrella crate.
So is it just impossible to do this on Windows, or is there some other way that one can control which symbols a static library still provides? Cc @ChrisDenton |
Hm, that should be possible. If you create a fully self-contained object that has no external references and place it in a lib which exports only unique symbols then there won't be conflicts. Maybe there's some technical reason that makes that hard to do in rustc but I'm struggling to think of a reason why it can't work. |
That is the hard part. On Unix you can use |
What does |
|
Ah, ok that makes sense. I think it should be possible to merge objects into a single object on Windows, though admittedly I'm not aware of any tools to do it. You're basically doing the job of a linker but, instead of a PE file, the output is a COFF object. So it's not easy. It would be easier if we could compile the whole crate graph to an object as though it were a single crate. But that's not necessarily practical, especially without build-std. |
We can't do this in a backwards compatible way unfortunately. We can't demand that staticlib dependencies are compiled in a special way. A potential "solution" would be taking all object files and rewriting the symbol names in them (with the exception of those that should be imported/exported) to contain a hash of the crate hash of the staticlib. That would trivially avoid symbol conflicts, but probably be rather slow. |
@bjorn3 |
We literally already mangle symbols based on the |
Could we offer this on platforms that support it? No reason to hold back Linux just because the Windows linker is so inflexible.
|
I'm afraid people will start to rely on it and then have their code mysteriously break on other platforms. Maybe it would be possible to do partial linking on platforms that support it and object file rewriting on all other platforms though? That way it is supported on all platforms, without unnecessarily hurting compile time performance on systems that support partial linking. |
I was not aware of the Cmetadata flag. And yes you are entirely correct that the stdlib is the problem. This makes me curious... If I set the Cmetadata and rebuild std (dont know the flag name, but I know it exists) then it should not run into this symbol conflict as it uses the newly passed metadata for mangling? I dont really care much about build times because I would probably only do this for actual release builds and rebuilding std does not take that long. |
Mostly. There are a couple of symbols in libstd marked with |
…bol, r=wesleywiser Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707
…bol, r=wesleywiser,jieyouxu Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707
…bol, r=<try> Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707 try-job: aarch64-gnu-debug try-job: aarch64-apple try-job: x86_64-apple-1 try-job: x86_64-mingw-1 try-job: i686-mingw-1 try-job: x86_64-msvc-1 try-job: i686-msvc-1 try-job: test-various try-job: armhf-gnu
…bol, r=<try> Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707 try-job: aarch64-gnu-debug try-job: aarch64-apple try-job: x86_64-apple-1 try-job: x86_64-mingw-1 try-job: i686-mingw-1 try-job: x86_64-msvc-1 try-job: i686-msvc-1 try-job: test-various try-job: armhf-gnu
…bol, r=wesleywiser,jieyouxu Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang#104707 try-job: aarch64-gnu-debug try-job: aarch64-apple try-job: x86_64-apple-1 try-job: x86_64-mingw-1 try-job: i686-mingw-1 try-job: x86_64-msvc-1 try-job: i686-msvc-1 try-job: test-various try-job: armhf-gnu
…sleywiser,jieyouxu Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang/rust#104707 try-job: aarch64-gnu-debug try-job: aarch64-apple try-job: x86_64-apple-1 try-job: x86_64-mingw-1 try-job: i686-mingw-1 try-job: x86_64-msvc-1 try-job: i686-msvc-1 try-job: test-various try-job: armhf-gnu
…sleywiser,jieyouxu Mangle rustc_std_internal_symbols functions This reduces the risk of issues when using a staticlib or rust dylib compiled with a different rustc version in a rust program. Currently this will either (in the case of staticlib) cause a linker error due to duplicate symbol definitions, or (in the case of rust dylibs) cause rustc_std_internal_symbols functions to be silently overridden. As rust gets more commonly used inside the implementation of libraries consumed with a C interface (like Spidermonkey, Ruby YJIT (curently has to do partial linking of all rust code to hide all symbols not part of the C api), the Rusticl OpenCL implementation in mesa) this is becoming much more of an issue. With this PR the only symbols remaining with an unmangled name are rust_eh_personality (LLVM doesn't allow renaming it) and `__rust_no_alloc_shim_is_unstable`. Helps mitigate rust-lang/rust#104707 try-job: aarch64-gnu-debug try-job: aarch64-apple try-job: x86_64-apple-1 try-job: x86_64-mingw-1 try-job: i686-mingw-1 try-job: x86_64-msvc-1 try-job: i686-msvc-1 try-job: test-various try-job: armhf-gnu
When compiling a cdylib only
#[no_mangle]
symbols are exported.#[rustc_std_internal_symbol]
and mangled symbols are not exported. This prevents symbol conflicts and avoids overriding symbols in ways that causes UB when using a rust cdylib in a rust program. For staticlibs however all symbols leak out of the staticlib. Causing symbol overrides that are potentially UB and symbols conflicts. For example when statically linking spidermonkey, you will get symbol conflicts if the rust version you compile your program with doesn't match the one the rust parts of spidermonkey were compiled with.cc rust-lang/wg-allocators#108 (comment)
cc https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/.E2.9C.94.20spidermonkey-wasm-rs/near/309604553
The text was updated successfully, but these errors were encountered: