Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite local labels in inline assembly #81570

Closed

Conversation

asquared31415
Copy link
Contributor

@asquared31415 asquared31415 commented Jan 30, 2021

It is possible for certain asm! blocks to be duplicated and emitted more than once by the compiler through inlining or other optimizations. If a duplicated asm! block has a label, the compiler will error with a invalid symbol redefinition error. If the -g flag is passed to the compiler, the compiler will abort due to a crash with no output. #74262 shows an example of one such error and crash. This bug also occurs if two separate asm! blocks have the same label, preventing the reuse of simple labels like loop:.

This PR aims to fix these issues by making all labels in asm! blocks local labels using LLVM's ${:private} and ${:uid} as suggested in #81088. Local labels do not suffer from the same crashes and errors that previously occurred.

Closes #74262
Closes #81088
Addresses part of this comment in the inline assembly tracking issue

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 30, 2021
@asquared31415
Copy link
Contributor Author

I have two concerns about this change.

  1. Performance - This change adds a significant amount of string processing as compared to the simple operation of just adding a string before, however it's all compile time processing. Can/should this be tested for performance, and if so, how?
  2. Is it legal to define a label in one asm! block and use it in another? This code would break that if that was allowed.

@nagisa
Copy link
Member

nagisa commented Jan 30, 2021

Is it legal to define a label in one asm! block and use it in another? This code would break that if that was allowed.

I remember seeing that being done in some of the codebases using GCC inline assembly, but we're still in a position to call it UB or something along those lines.

@asquared31415
Copy link
Contributor Author

That's unfortunate. I couldn't find any information in the llvm docs or in the inline assembly rfc either confirming or denying that it was allowed. Though it makes sense that it's at least legal on llvm's side, since that's the cause of this whole issue anyway, that labels are global by default and llvm sees them as a redefinition.

@asquared31415
Copy link
Contributor Author

I have done some thinking on how to possibly solve the fact that this makes all labels local. The possible solutions that I can think of are:

  1. labels are always local to your inline assembly block. This is possibly undesirable as it prevents use of labels across blocks entirely. This is what this PR currently does.
  2. labels are local by default unless you specify them to be global. This can still cause issues with inline and the weird redefinition errors and it would still crash if a user specified global labels and they were duplicated and built with debug mode. However this time, it requires the user to actively specify something to break, and the default (and more intuitive to me) case is fine.
  3. labels are global by default unless you specify them to be local. This is how the current state of asm! is and it causes these issues to appear.

@oli-obk
Copy link
Contributor

oli-obk commented Feb 1, 2021

r? @nagisa

feel free to pingpong back to me if you don't have capacity for this right now, just seemed like you know more about this than me :)

@rust-highfive rust-highfive assigned nagisa and unassigned oli-obk Feb 1, 2021
@nagisa
Copy link
Member

nagisa commented Feb 2, 2021

I think it is fine to say labels are local to the given asm!() invocation for now, if not forever. We are not bound by having to support ancient code that relied on this accidental feature exposed by the more naive compilers.

We can consider adding something for the global labels when and if somebody comes with well motivated use-cases and, probably, an RFC.

@nagisa
Copy link
Member

nagisa commented Feb 2, 2021

@Amanieu you mention in this comment that the RFC specified only local labels shall be used, but then suggest localizing the labels in this one. Should that be considered the official position of the WG-inline-asm?

If not, I perhaps would be more comfortable with a built-in diagnostic that prevents use of non-local labels in the first place, with suggestions on how the user could adjust their code to use the local labels instead. Denying that kind of code would also prevent accidental lock in into a design decision that we end up regretting later. That would also set us up for denying code that uses LLVM-specific features like… ${:uid}

@asquared31415
Copy link
Contributor Author

Specifying all labels to be local is fine in my opinion as well, and I took that approach based on the issue you just linked, #81088. #76704 is possibly relevant, as if the decision is made to make all labels local, the unstable book should reflect that.

@Amanieu
Copy link
Member

Amanieu commented Feb 2, 2021

Labels defined in asm! are always local. You cannot use them outside the current asm block since the asm may be duplicated by LLVM.

You can only define labels with a global scope in global_asm! (which doesn't need any rewriting).

@Amanieu you mention in this comment that the RFC specified only local labels shall be used, but then suggest localizing the labels in this one. Should that be considered the official position of the WG-inline-asm?

The asm! RFC as written only allows GAS local labels (0:, 1:) in asm!. However I have since changed my mind about this:

  • This requirement is not obvious and is unfamiliar to people used to writing assembly.
  • It is easy to forget: code using named labels will usually work fine until LLVM suddenly decides to duplicate the asm! block in the future.
  • LLVM's intel asm syntax sometimes parses 0b and 1b as binary literals instead of local label references, which leads to confusing errors.

I raised this suggestion in the asm! tracking issue (#72016 (comment)) and nobody has objected so far. I think this change is strictly beneficial and doesn't actually lock us into any LLVM-specific features. GCC's inline asm has an equivalent to {:uid} (%=), and the cranelift fallback of going through an external assembler doesn't need it.

@rust-lang/wg-inline-asm Any objections to adding this feature?

Copy link
Member

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some codegen tests that check that the correct string is passed to LLVM.

@@ -212,6 +213,10 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> {
}
}
} else {
if let Some(label) = get_llvm_label_from_str(s) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A template string piece may consist of multiple lines (e.g. raw string literals). These strings could define more than one label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I didn't think of this case

('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c
}

// All subsequent chars are [a-zA-Z0-9_$.@]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we disallowed @ and $ in symbol names. We don't need to support all symbol names here, the user is free to shoot themselves in the foot if they choose to use a symbol name with weird characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we disallowed @ and $ in symbol names.

That should happen elsewhere, possibly in the same location parsing for formatting specifiers happens. Should also be a concern for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that these may not be valid characters in symbol names for all targets and may have other special meanings on those targets.

Copy link
Member

@nagisa nagisa Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree. I still think this is the the wrong place in the codebase to enforce what characters we allow in asm labels. Probably should happen as a separate PR too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.

Also we're not handling all possible label definitions. For example labels can be defined after a ; on targets where that is a statement separator.

We're not trying to handle all possible cases here since that would require us to effectively parse the asm, which we really want to leave to LLVM. We're just trying to make the most common types of labels work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.

Aha, I see the distinction you're trying to make now.

We're just trying to make the most common types of labels work correctly.

That seems like a recipe for a pretty poor experience with a very non-obvious cliff at which things may start failing in non-obvious ways. For instance, user with passing familiarity with Rust's asm! labels being local could write something like this:

#[inline] // not always
pub fn peach() {
    unsafe {
        asm!("per$$imon: jmp per$$imon")
    }
}

and it might very well work out in basic tests, but start failing once the function is released to the field. If we do go the route of making things automagic, its better they're magic all the time, not only sometimes.

that would require us to effectively parse the asm, which we really want to leave to LLVM

I believe just lexing would be sufficient, but I do agree that we'd rather not do that at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least I would like @ to be removed from the set of allowed characters for symbols. @ is used as a symbol modifier in x86 intel syntax: sym@gotpcrel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm what we want to accept as a valid identifier for the purposes of replacing labels. Denying symbols outright should be properly implemented somewhere else, preferably with a compiler warning or error. This code can reflect what symbols are accepted, but it will ignore other things. For example, @foobar: will be replaced with @${:private}foobar${:uid} if we choose to not acknowledge @ in labels.

Is the set of symbols we will accept in a label [a-zA-Z0-9_$.]?

Additionally, do we want to accept labels that are strings like nagisa commented here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for label renaming we should only support [a-zA-Z0-9_.], and only plain labels not quoted ones.

// Fixes https://github.com/rust-lang/rust/issues/74262
for label in labels.iter() {
let ranges: Vec<_> = template_str
.match_indices(label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this checks if the character before the match is a non-identifier character. This means that if you have a label bar then foobar will be replaced with foo${:private}bar${:uid} which is really not what we want.

// Valid llvm symbols: https://llvm.org/docs/AMDGPUOperandSyntax.html#symbols
// First char is [a-zA-Z_.]
fn llvm_ident_start(c: char) -> bool {
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c
matches!(c, 'a'..='z' | 'A'..='Z' | '_' | '.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches! is more concise, thanks

|| '_' == c
|| '$' == c
|| '.' == c
|| '@' == c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ? mark is a valid continuation character according to clang: https://rust.godbolt.org/z/EsP4sW, though it does appear like a bug and is only intended to be allowed when emulating the microsoft cl. Nevertheless it is something that LLVM currently will let through.

return Some(substr);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to get more arbitrary stuff into a asm label is to quote it as a string, like this:

void test(void) {
    __asm__(".intel_syntax noprefix\n"
    "\"01☺☺@banana.?@!\": jmp \"01☺☺@banana.?@!\"");
}

or in Rust

#![feature(asm)]
pub fn test() {
    unsafe {
    asm!(r#"
        "☺helloworld?@!$%^&": jmp "☺helloworld?@!$%^&"
    "#);
    }
}

So that's something we should handle or forbid too.


/// Gets a LLVM label from a &str if it exists
fn get_llvm_label_from_str(s: &str) -> Option<&str> {
if let Some(colon_idx) = s.find(':') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use trim_start to strip leading whitespace from the line.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 3, 2021

Forcing all labels within an asm block to be local would certainly improve usability for the common case (so that people don't have to use local labels).

However, it is legal in clang and gcc to define a label in an asm block and make that label a global symbol. And I've seen that used, on occasion, in real projects, including the Linux kernel. (For instance, suppose you want a global symbol pointing to a location that can be overwritten as a "probe point".) Rewriting would cause such code to silently "work" while making a global symbol with the wrong name, rather than catching the issue.

I agree that such code is not especially safe, if you're not certain that the asm code will appear in the binary exactly once. (In C that's a little easier to guarantee.) It's safer in global_asm, and I do think that we should support it there eventually.

In the meantime, I personally would prefer to make a best-effort attempt to catch the use of non-local labels in asm! and error on them, rather than attempting to rewrite them (which requires more careful lexing of inline assembly). Alternatively, if the thought is that we might allow global labels in the future, perhaps we could attempt to lex a little more of the inline assembly, catch attempts to use .globl or .global, and emit an error when applying those directives to a label.

EDIT: I posted this to the PR because I would prefer to just reject non-local labels, which would be a change to this PR. If we decide instead to reject .globl/.global, then the discussion should move to the tracking issue and the outcome would be a separate PR.

@Amanieu
Copy link
Member

Amanieu commented Feb 3, 2021

Let's move this discussion over to #81088. Comments on the PR should be about reviewing the implementation, not the design or discussing whether we should do this.

@joshtriplett
Copy link
Member

@Amanieu Alright, reposted to #81088 .

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2021
@nagisa
Copy link
Member

nagisa commented Mar 7, 2021

I'm not entirely sure what it would take for this PR to land, myself. It seems pretty clear that we'd want to have all inline assembly labels be local, but on the other hand implementing a inline assembly parser in rustc is also not really something I think we should do.

Thinking about the ideal way to solve this, I think we'd want to add support for this kind of interpretation of the inline assembly labels to the backend (i.e. LLVM) itself. Then you wouldn't have to think hard about the corner cases such as label names being almost anything if quoted, or different targets supporting their own special flavour of syntax/identifiers.

I'm still not sure about only handling the most common cases, primarily because it might feed into a decision to stabilize the inline assembly in its partially finished state. Once stabilized, I don't think we'll have any leeway in changing the behaviour anymore (except maybe on edition boundaries?) – users would come to rely on the labels being either global or local, and behaviour change in any direction would break compatibility.

@Amanieu
Copy link
Member

Amanieu commented Mar 7, 2021

Perhaps we should just lint against labels for now until we have made a decision on how best to support labels in asm!.

@nagisa
Copy link
Member

nagisa commented Mar 7, 2021

Yeah, banning/linting against label definitions that don't follow [0-9a-zA-Z_\.]+ (and then rewriting only those) for now is also a path we could take.

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2021
@Dylan-DPC-zz
Copy link

@asquared31415 this is waiting on you to reply to the above reviews. Thanks

@bors
Copy link
Contributor

bors commented Apr 5, 2021

☔ The latest upstream changes (presumably #83864) made this pull request unmergeable. Please resolve the merge conflicts.

@crlf0710
Copy link
Member

Triage: Seems this is blocked on #81088

@crlf0710 crlf0710 added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. F-asm `#![feature(asm)]` (not `llvm_asm`) and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 23, 2021
@asquared31415 asquared31415 changed the title Fix compiler crashes and errors with duplicate labels in inline assembly Rewrite local labels in inline assembly Jul 7, 2021
@asquared31415
Copy link
Contributor Author

I think that for the time being we have decided to not pursue this path. I am willing to do the work if this (or a similar) approach is decided on.

@asquared31415 asquared31415 deleted the asm-duplicate-labels-fix branch October 9, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-asm `#![feature(asm)]` (not `llvm_asm`) S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work.
Projects
None yet