Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asm! should automatically rewrite labels in assembly code #81088

Open
Amanieu opened this issue Jan 16, 2021 · 28 comments
Open

asm! should automatically rewrite labels in assembly code #81088

Amanieu opened this issue Jan 16, 2021 · 28 comments
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-asm `#![feature(asm)]` (not `llvm_asm`)

Comments

@Amanieu
Copy link
Member

Amanieu commented Jan 16, 2021

Defining labels in inline assembly is broken because inlining, function cloning and other compiler optimizations may result in an asm! block being emitted more than once in the final binary. This can cause duplicate symbol errors or even compiler crashes (#74262). For this reason it is recommended to only use GAS local labels when using asm! (#76704).

To improve user experience when using asm!, rustc should automatically rewrite label names into a unique name. For example:

asm!(
    "popcnt {popcnt}, {v}",
    "loop:",
    "blsi rax, {v}",
    "jz exit",
    "xor {v}, rax",
    "tzcnt rax, rax",
    "stosb",
    "jmp loop",
    "exit:",
    v = inout(reg) value => _,
    popcnt = out(reg) popcnt,
    out("rax") _, // scratch
    inout("rdi") bits.as_mut_ptr() => _,
);

rustc should search for lines beginning with <ident>: to find label names and then replace all instances of <ident> in the assembly code with .Lasm_label_<ident><unique id>. The .L instructs the assembler that the label is local to the current compilation unit and should never be exported, even as a debug symbol. The unique ID can be generated by LLVM using ${:uid} and is guaranteed to be unique within the current compilation unit.

@Amanieu Amanieu added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-inline-assembly Area: Inline assembly (`asm!(…)`) F-asm `#![feature(asm)]` (not `llvm_asm`) labels Jan 16, 2021
@nagisa
Copy link
Member

nagisa commented Jan 17, 2021

Any reason to bother with .L and ${:uid} instead of just replacing them with numbered labels?

@Amanieu
Copy link
Member Author

Amanieu commented Jan 17, 2021

Internally the assembler will turn numbered labels into .L local symbols. Doing it directly makes the implementation slightly simpler since we don't need to track whether a label is before or after a particular instruction.

@asquared31415
Copy link
Contributor

@rustbot claim

@Amanieu
Copy link
Member Author

Amanieu commented Jan 28, 2021

@asquared31415 Thanks for taking up this issue!

I would suggest doing the rewriting in the backend in compiler/rustc_codegen_llvm/src/asm.rs.

@asquared31415
Copy link
Contributor

I already have an implementation working in the simple cases, and was about to submit a pr when I realized it is going to be a lot more work to fix the edge cases and that I should claim these issues so someone else doesn't try to work on them too

@joshtriplett
Copy link
Member

Reposting a comment that I left on #81570:

Forcing all labels within an asm block to be local would certainly improve usability for the common case (so that people don't have to use local labels).

However, it is legal in clang and gcc to define a label in an asm block and make that label a global symbol. And I've seen that used, on occasion, in real projects, including the Linux kernel. (For instance, suppose you want a global symbol pointing to a location that can be overwritten as a "probe point".) Rewriting would cause such code to silently "work" while making a global symbol with the wrong name, rather than catching the issue.

I agree that such code is not especially safe, if you're not certain that the asm code will appear in the binary exactly once. (In C that's a little easier to guarantee.) It's safer in global_asm, and I do think that we should support it there eventually.

In the meantime, I personally would prefer to make a best-effort attempt to catch the use of non-local labels in asm! and error on them, rather than attempting to rewrite them (which requires more careful lexing of inline assembly). Alternatively, if the thought is that we might allow global labels in the future, perhaps we could attempt to lex a little more of the inline assembly, catch attempts to use .globl or .global, and emit an error when applying those directives to a label.

I would prefer to just reject non-local labels.

@asquared31415
Copy link
Contributor

That approach would work, and perhaps be better for allowing future choices. We can always remove an error and make something work later, but if this causes weird behavior, it's a little harder to revert, especially if people rely on it. As discussion in #81570 progressed I have become more concerned about enforcing things that are wrong, or preventing the use of things that should work, so I support making them a compiler error for now and having a proper discussion about what asm! should do in regards to labels.

Perhaps it would be a good idea to catch .global along with the syntax flags as in #79869 and also look for non-local labels and suggest a fix if possible for the labels too. I had already been glancing at #79869 and would be willing to try making it error instead. I would need a little direction here, as I am not certain where the error should be detected and how to emit compiler errors.

I think that if it is decided that we should error on global labels rather than rewrite, #81570 should be closed no matter the decision on the detection of .global, as that pr's goal was to remove the errors and crashes, whereas changing to erroring on labels would only remove a crash and change the error.

@Amanieu
Copy link
Member Author

Amanieu commented Feb 3, 2021

There is a precedent for label rewriting: Clang does this for labels in MS-style inline asm (source), which is what inspired me to suggest this.

Clang is a bit smarter than us since it uses LLVM MC to actually parse the asm as it is rewriting, but it also does a lot more than us (e.g. inferring operand & constraints). Also this LLVM support only exists for x86, so it wouldn't be suitable for us.

I still think that automatic label rewriting is the best thing we can do at the moment. We should certainly lint against the use of .global/.globl in the same way as #79869. This will help with people accidentally expecting exported label names to work.

@comex
Copy link
Contributor

comex commented Feb 3, 2021

Sure, but MS-style inline asm is overall quite a bit more magic than GNU-style inline asm.

GNU-style inline asm has always been passed straight through to the assembler, except for making substitutions where substitutions are requested. In the case of GCC, or Clang with -fno-integrated-as, this is done without even validating it in any way: you can take a C file containing asm("do a barrel roll");, pass it to gcc -S, and get an "assembly" file with do a barrel roll pasted in the middle of a function. This is a desirable property, since it allows inline asm to use instructions which are known to an external assembler but not the compiler.

While rewriting labels doesn't directly affect that use case, it still violates the principle of transparency. Users expect assembly inside asm! to have the same syntax as standalone assembly, not "the same syntax except for this one magic feature". And this will become more confusing if global_asm! is added.

Also, if naked functions are stabilized, defining a global symbol within asm! might be a reasonable thing to do in those (though I think some guarantees about compilation model would need to be worked out before it could really have well-defined semantics).

There's also the "cowboy code" concern mentioned by @joshtriplett. He said that ensuring asm code would appear only once is "a little easier to guarantee", but I don't think that's quite right. In both Rust and GNU C, it is impossible to guarantee inline assembly will be instantiated exactly once (not counting global inline assembly and perhaps naked functions), because the compiler is always allowed to duplicate code within a function. But Linux does it anyway, because that's their attitude toward a lot of compiler things.

People might attempt the same in Rust. Personally, I'd prefer if rustc didn't go out of its way to break it; after all, if the compiler does decide to duplicate the asm, the result will be an assembler error, not UB. I suspect others might strongly prefer if rustc did go out of its way to break it. Regardless, it's a problem if it seems to work but does something unexpected due to the rewriting.

Finally, this kind of semi-blind scanning of assembly code is impossible to make robust. For example, it's perfectly legal for a label to have the same name as an instruction (jmp: jmp jmp).

@asquared31415
Copy link
Contributor

I agree mostly with the assessment that the compiler shouldn't explicitly forbid things that may be legal. Especially with such a raw construct like inline assembly, outright forbidding thing is probably bad unless it's actually an issue. I definitely think that the compiler should emit a warning for labels that could be bad, and require the user to explicitly disable the warning if that's really what they want. This would still allow the compiler to have issues, but I think that warning the user is desirable.

I think that in the case of #79869 and .global, the compiler should error because those things are expressed using a different syntax.

@Amanieu
Copy link
Member Author

Amanieu commented Feb 4, 2021

Wild idea that I just thought of, what if we added extra syntax to asm! for labels:

asm!(foo: "jmp {foo}");

This avoids the need to parse the assembly code since labels are defined outside the asm string and references to labels use the placeholder syntax. We could then have a lint which recommends people to use this syntax if we find labels inside the assembly code.

@asquared31415
Copy link
Contributor

That solution then prevents the use of multiline strings that could have multiple labels. I do kinda like it, but it also seems bad for the reasons mentioned by comex regarding transparency. Inline assembly is not a rust construct, and I don't know how much we should extract from the raw inline assembly.

@bstrie
Copy link
Contributor

bstrie commented Apr 7, 2021

after all, if the compiler does decide to duplicate the asm, the result will be an assembler error, not UB.

If an instance of this can't result in anything other than a compilation error then I think it's reasonable to merely document this and leave it as-is. Or is there some worse failure case that I'm not seeing?

I'm not necessarily opposed to a warning here, but I would say that warnings must be actionable somehow. The user isn't capable of controlling whether or not the block gets duplicated, right? Therefore a warning here would really just mean "either silence this warning or don't use loop labels at all". And parsing/validating the assembly strings is something that the RFC seems to go to great lengths to avoid, which would make this a bit of an odd case.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 7, 2021

LLVM has been know to crash in some cases: #74262.

The tricky part is that you only get a compilation error if the function containing the asm! gets inlined into another function multiple times. So your code could appear to work fine until the optimizer eventually makes a different choice and then your code breaks.

The reason I would like to have this issue resolved before stabilization is that the choice we make here will have a significant effect on the way people write inline asm.

At the moment I'm still tending towards the solution that I came up with in #81088 (comment):

asm!(foo: "jmp {foo}");

Labels in inline assembly are fundamentally different from labels in normal assembly because labels are local to the current asm block rather than global.

@bstrie
Copy link
Contributor

bstrie commented Apr 7, 2021

asm!(foo: "jmp {foo}");

Would this be paired with completely banning ordinary labels? After all, if you're going to parse the strings just to warn the user that they shouldn't be using labels, then perhaps an outright ban is better.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 7, 2021

Quick recap of 3 different options:

Option 1: do nothing

The only correct way to write inline assembly is to use GAS-style numbered local labels.

asm!("0: jmp 0b");
  • This can cause issues on x86 with intel syntax: LLVM will sometimes parse 0b and 1b as binary literals.
  • Perhaps we can attempt to scan the assembly code to look for things resembling labels and lint against them, but make no attempt at automatic rewriting.
  • Assembly code is generally less readable due to poor label names.

Option 2: automatic rewriting of labels

Named labels can be used inside inline assembly and get automatically rewritten to a unique name by the compiler.

asm!("foo: jmp foo");
  • This makes assembly work the way most people would expect it to.
  • This is a big change from the rest of asm! which specifically avoids having to do any parsing of the assembly code itself. There is the possibility of making mistakes since we only do token-level replacement (e.g. label with the same name as an instruction).
  • This behavior may surprise users who (incorrectly) expect a label defined in asm! to be globally visible.

Option 3: special syntax for labels

We introduce special syntax to asm! to represent labels.

asm!(foo: "jmp {foo}");
  • This uses the same type of placeholders as named register arguments, so there is no need to parse the asm code.
  • This makes it clear that asm! labels behave differently from normal labels.
  • We can still lint against named labels in the asm code like in option 1 and recommend using the new syntax instead.

However there are several downsides:

  • Previously you could re-use snippets of assembly code by using a macro_rules! that contained a string literal. This would now require special syntax since assembly code is no longer just a string literal. See this Zulip thread for a possible solution using a new asm_snippet! builtin macro.
  • For consistency, we should also support this in global_asm!. However global_asm! is allowed to define global labels (since it can't get inlined anywhere). It may be confusing to have two different label styles in global_asm!.

@asquared31415
Copy link
Contributor

I would much rather be able to use more descriptive labels in inline asm, however, I don't think there's a way to do this without a lot of parsing of inline assembly, which I think should be avoided, and even if that were done, I don't think that it can be done perfectly.

With this in mind, I think I am in favor of doing nothing, specifying in documentation that local labels are required and doing no parsing/rewriting. Declare use of other labels to be undefined behavior, because it depends on the exact way the compiler decides to emit things, and ideally get the crash on windows with -g fixed in LLVM (though IDK how feasible that is).

@comex
Copy link
Contributor

comex commented Apr 9, 2021

For Option 3, why not do it like this?

asm!("{foo}: jmp {foo}");

Then you still have a single string literal.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 10, 2021

Keep in mind that {ident} is also used for named operands which get substituted with register names. I don't like the idea of simply treating unknown identifiers as a label since that prevents us from emitting an error for typos.

@joshtriplett
Copy link
Member

FWIW, I think it's worth considering a hybrid approach: we could go with approach 1 for now, stabilize inline assembly with that approach, and consider whether we also want to support approach 3 in the future along with things like asm goto.

@asquared31415
Copy link
Contributor

Doing nothing is the option that breaks the least, and doesn't lock into any irreversible design decisions. Could possibly introduce a Rust lint or warning rather than an error for labels that can currently cause label duplications.

@comex
Copy link
Contributor

comex commented Apr 15, 2021

Doing nothing is the option that breaks the least, and doesn't lock into any irreversible design decisions.

I'm not sure that's true. People will try to use asm! to define global symbols, and it will work in practice under common circumstances (where the code containing the asm! is referenced and not duplicated). Then in the future, either making it an error, or automatically rewriting the symbols, would break that code.

@amluto
Copy link

amluto commented Apr 18, 2021

As a possible solution that might not involve parsing asm code, binutils' gas does sort of have a solution:

.macro some_macro
        jmp .Lmy_label_\@
.Lmy_label_\@:
        nop
.endm

This isn't directly applicable to inline asm, but inline asm could plausibly be instantiated as a macro and an invocation of the macro. A macro argument or some other such trick could be used to pass the name of the macro or a per-macro unique id in.

Option 3 could also IMO be improved a bit to be less magic: instantiate {foo} as a unique label if and only iff foo is actually declared as a label by some syntax outside of the asm string. Or make {label:foo} turn into a unique label.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Jul 6, 2021

Keep in mind that {ident} is also used for named operands which get substituted with register names. I don't like the idea of simply treating unknown identifiers as a label since that prevents us from emitting an error for typos.

How about

asm!("{foo}: jmp {foo}", foo = label);

label just generates a unique label and is substituted similar to an allocated register operand.

@Amanieu
Copy link
Member Author

Amanieu commented Jul 7, 2021

This is how I planned for asm!(foo: "jmp {foo}") to be represented internally in the compiler, but I don't like the fact that you have to declare the label name twice.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 14, 2021
…nieu

Lint against named asm labels

This adds a deny-by-default lint to prevent the use of named labels in inline `asm!`.  Without a solution to rust-lang#81088 about whether the compiler should rewrite named labels or a special syntax for labels, a lint against them should prevent users from writing assembly that could break for internal compiler reasons, such as inlining or anything else that could change the number of actual inline assembly blocks emitted.

This does **not** resolve the issue with rewriting labels, that still needs a decision if the compiler should do any more work to try to make them work.
@ghost
Copy link

ghost commented Aug 22, 2021

How about asm!("call {fn foo}") for functions and asm!("mov eax, {static BAR}"} for statics?

@Amanieu
Copy link
Member Author

Amanieu commented Aug 22, 2021

We already have this with the sym operand type. This issue is about labels defined in an asm code snippet.

@bradjc
Copy link
Contributor

bradjc commented Sep 10, 2021

I don't know if there is interest in community votes, but I would like to see any option other than Option 1 (use numeric labels). Assembly code is hard enough to read without being able to use descriptive labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-asm `#![feature(asm)]` (not `llvm_asm`)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants