Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions referenced by global_asm merged into other functions by LLVM #96797

Closed
3 tasks
nbdd0121 opened this issue May 7, 2022 · 9 comments · Fixed by #99059
Closed
3 tasks

Functions referenced by global_asm merged into other functions by LLVM #96797

nbdd0121 opened this issue May 7, 2022 · 9 comments · Fixed by #99059
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) F-asm `#![feature(asm)]` (not `llvm_asm`)

Comments

@nbdd0121
Copy link
Contributor

nbdd0121 commented May 7, 2022

#![feature(asm_sym)]

use std::arch::global_asm;

#[no_mangle]
fn my_func() {}

global_asm!("call_foobar: jmp {}", sym foobar);

fn foobar() {}

fn main() {
    extern "Rust" {
        fn call_foobar();
    }
    unsafe { call_foobar() };
}

This code will fail to link because foobar is merged into my_func.

Since global asms in LLVM can't reference symbols natively, we currently will add the mangled name to the string and pass it to LLVM. To prevent the file being optimized away, we add it to llvm.compiler.used.

A few issues exist:

  • rustc_monomorphize does not handle these functions specially, and therefore foobar above is considered as internal function and have internal attribute attached.
  • LLVM believes that it is allowed to merge internal functions into an exported function despite present in llvm.compiler.used. I think LLVM should be allowed to this because we emit the internal linkage, and from my understanding renaming a local symbol should probably be allowed. @tmiasko argues otherwise and says that llvm.compiler.used shouldn't touch the symbol.
  • The symbols are added llvm.compiler.used, but it wouldn't show up as used in the exported symbol list, so it can still be stripped while doing LTO.

We should do the following:

  • Ensure that these functions are not emitted with internal linkage
  • Ensrue that the functions used will be marked as used in exported_symbols/linked_symbols and thus reach the linker/LTO.
  • Ask LLVM to clarify the behaviour of combining llvm.compiler.used + internal .

cc @Amanieu @tmiasko
@rustbot label: +A-inline-assembly +F-asm

This will block #93333

@rustbot rustbot added A-inline-assembly Area: Inline assembly (`asm!(…)`) F-asm `#![feature(asm)]` (not `llvm_asm`) labels May 7, 2022
@nbdd0121 nbdd0121 changed the title Functions referenced by global_asm is merged into other functions by LLVM Functions referenced by global_asm merged into other functions by LLVM May 7, 2022
@Amanieu
Copy link
Member

Amanieu commented May 8, 2022

I think llvm.compiler.used should be enough to ensure that the symbol reaches the linker even through LTO. So really the only thing to do is change the symbol visibility. This isn't ideal though since we don't need this symbol to be exported outside the current compilation unit, it's just that LLVM is unable to see the use of the symbol.

@Amanieu
Copy link
Member

Amanieu commented May 8, 2022

It could be argued that this is actually an LLVM bug since internal linkage is supposed to preserve the symbol name in the object file:

internal

Similar to private, but the value shows as a local symbol (STB_LOCAL in the case of ELF) in the object file. This corresponds to the notion of the ‘static’ keyword in C.

@nbdd0121
Copy link
Contributor Author

nbdd0121 commented May 8, 2022

I think llvm.compiler.used should be enough to ensure that the symbol reaches the linker even through LTO.

I had to have 1af3e0a to prevent #[used] from being thrown away.

@nbdd0121
Copy link
Contributor Author

nbdd0121 commented May 8, 2022

Anyway, the biggest issue is the internal linkage, which is wrong anyway if the global asm and the referenced symbol are not in the same CGU. Given that currently both exported/linked_symbol and the linkage computation comes from the same source of truth reachable_non_generics, a potential fix should be able to address all issues simultaneously.

@Amanieu
Copy link
Member

Amanieu commented Jun 8, 2022

  • rustc_monomorphize does not handle these functions specially, and therefore foobar above is considered as internal function and have internal attribute attached.

This is fixed by #97900.

  • LLVM believes that it is allowed to merge internal functions into an exported function despite present in llvm.compiler.used. I think LLVM should be allowed to this because we emit the internal linkage, and from my understanding renaming a local symbol should probably be allowed. @tmiasko argues otherwise and says that llvm.compiler.used shouldn't touch the symbol.

The proper behavior for LLVM is still unclear, but at least #97900 fixes the immediate issue since those symbols now have external linkage.

  • The symbols are added llvm.compiler.used, but it wouldn't show up as used in the exported symbol list, so it can still be stripped while doing LTO.

This is a separate issue, but I think the proper way to fix this is to ensure that the global_asm! itself reaches the linker. The problem here is that if there are no symbols to pull in the object file containing a global_asm! from an archive then that global_asm! will not be linked into the final binary. I think we can fix this by inserting a dummy symbol in the global_asm! and then including that symbol in the synthetic object file from #95604.

@nbdd0121
Copy link
Contributor Author

nbdd0121 commented Jun 8, 2022

This is a separate issue

It's of a different cause, but will have the same effect. I prefer to have it tracked in the same issue (but different work items).

but I think the proper way to fix this is to ensure that the global_asm! itself reaches the linker. The problem here is that if there are no symbols to pull in the object file containing a global_asm! from an archive then that global_asm! will not be linked into the final binary. I think we can fix this by inserting a dummy symbol in the global_asm! and then including that symbol in the synthetic object file from #95604.

The problem is two fold. For the normal non-LTO linking, what you described is sufficient.

The major issue is LTO. Because LTO works on IR but not binary, it doesn't know if a symbol is being used in global_asm. We need to pass a list of symbols to LLVM to ensure they are kept. Currently this includes all exported and #[used] symbols, but it also needs to include all symbols referenced by global_asm.

@tmiasko
Copy link
Contributor

tmiasko commented Jun 9, 2022

Do you have an example demonstrating the LTO related issue?

Optimizations during LTO cannot remove symbols from llvm.compiler.used yet, since as you describe the references to those symbols from the inline assembly are still opaque to it. For example, the internalize pass in LLVM comments:

  // We must assume that globals in llvm.used have a reference that not even
  // the linker can see, so we don't internalize them.
  // For llvm.compiler.used the situation is a bit fuzzy. The assembler and
  // linker can drop those symbols. If this pass is running as part of LTO,
  // one might think that it could just drop llvm.compiler.used. The problem
  // is that even in LTO llvm doesn't see every reference. For example,
  // we don't see references from function local inline assembly. To be
  // conservative, we internalize symbols in llvm.compiler.used, but we
  // keep llvm.compiler.used so that the symbol is not deleted by llvm.

@Amanieu
Copy link
Member

Amanieu commented Jun 9, 2022

That comment suggests that even with internal visibility, LLVM isn't supposed to delete a symbol that is used by llvm.used. So perhaps the issue really is in the MergeFunctions pass which should be fixed to preserve a symbol alias if that symbol is used by llvm.[compiler].used.

@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2022

Here's my attempt at a fix in LLVM, but I'm not too happy about the code. It could use some review by someone more familiar with LLVM.

Amanieu/llvm-project@b286633

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 27, 2022
Rebase LLVM submodule

This is a rebase of our LLVM fork onto LLVM 14.0.5, which is intended to be the last release of the 14.x series. Additionally:

 * I've dropped three patches that were needed to build on FreeBSD 11, which is no longer necessary after rust-lang#97944.
 * I've dropped some cherry-picks that were later reverted.
 * I've cherry-picked llvm/llvm-project@caa2a82 for rust-lang#96797 (fyi `@Amanieu)`
Amanieu added a commit to Amanieu/rust that referenced this issue Jul 12, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 18, 2022
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 21, 2022
@bors bors closed this as completed in 4423341 Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) F-asm `#![feature(asm)]` (not `llvm_asm`)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants