Formatting guidelines for inline assembly #152

joshtriplett · 2020-06-10T04:11:21Z

An issue I've observed in several contexts discussing the new asm! syntax for inline assembly: everyone formats asm! statements differently, and we should 1) come up with guidance on how to do so, and 2) implement that guidance in rustfmt.

Notably, this includes how to format both single-line and multi-line assembly statements.

EDIT: I've updated these guidelines to use the new support for multiple template string arguments, implemented in rust-lang/rust#73364 .

With my style team hat on, I would propose the following guidelines:

Single-line assembly code should be formatted as a single string argument and treated as any other argument to a format macro.
Multi-line assembly code should be formatted with one string argument per line of assembly, indented as separate arguments:
```
asm!(
    "instruction 1",
    "instruction 2",
    ...,
);
```
Common assembly formatting such as \n\t (often seen in inline assembly for other compilers that directly copy the provided string into their assembly output) is not necessary with Rust inline assembly. Focus on keeping the assembly readable. Note that Rust is not prescriptive about the formatting of your assembly, so if you wish to put multiple instructions or directives or labels on one line (and thus within one assembly string), you can do so, and rustfmt will treat each assembly string as a line.
Use the opening " of each string as the base for any indentation within the assembly code; for instance, if you want to indent instructions four spaces past labels, include the indentation inside the string before the instructions. That way, Rust formatting can keep the strings aligned and your assembly code will remain aligned within them:
```
asm!(
    "1:",
    "    instruction 1",
    "    instruction 2",
    "2:",
    "    instruction 3",
);
```
Simple asm! with only one assembly string can have the entire asm! including all its arguments on the same line, if the whole thing from asm!( to ); (plus indentation) fits in the line width.
Any asm! block that needs breaking across multiple lines, or any asm! block that has multiple assembly strings no matter how short, should always put each argument on its own line, aligned one indentation level past the asm!, with a trailing comma on each, just like a function.
options(...) goes on one line if it fits; otherwise, format it as though it were a nested function call to a function named options.
Never place any space or line breaks inside of in(reg) or out(reg) or inout(reg) or in("regname") or out("regname") or similar; always treat them as a single atomic unit.
If an inout or inlateout pair of expressions are too long to fit on one line, break before (never after) the =>, and indent the => one level further:
```
asm!(
    "instruction {}",
    inout(reg) very_long_expression
        => very_long_out_expression,
)
```
If an in(reg) or out(reg) or lateout(reg) expression is too long, break between the ) and the expression and indent one level, then format from there; however, if the expression can be broken internally, follow same the rules for when to leave the head of the expression on the same line as the in(reg) or similar as if in(reg) were the opener of a function:
```
asm!(
    "instruction {}",
    in(reg)
        extremely_long_unbreakable_expression,
    in(reg) function_name()
        .method_name(method_arg)
        .further_chained_method(),
    out(reg) long_function_name(
        long_function_argument_expression,
    ),
);
```
For named arguments like name = in(reg) expression, line-break it as you would an assignment statement with the same indentation, treating the in(reg) expression as the right-hand side of the assignment, and following all the same rules as above.

The text was updated successfully, but these errors were encountered:

Amanieu · 2020-06-10T04:29:46Z

Formatting the string literal is actually very tricking since we technically need to preserve whitespace and newlines when formatting. This will limit how much we can we format the template string, in particular regarding indentation.

We could make an exception here since whitespace isn't significant for inline assembly.

joshtriplett · 2020-06-10T04:48:55Z

@Amanieu That's not entirely true, if you do something especially unusual in your inline assembly. Assembly supports multi-line .string directives, among likely other things:

#![feature(asm)]

fn main() {
    let mut i: u32 = 2;
    unsafe {
        asm!(
            r#"
            add {:e}, 3
            jmp 1f
            .string "hello
            world"
            1:
            "#,
            inout(reg) i,
        );
    }
    dbg!(i);
}

(Assume for the sake of argument that I used that string for something.)

If you look at the disassembly of this, the whitespace is significant. Changing the amount of whitespace at the start of lines would change the semantic meaning of the program.

One way around this would be if we recommended using concat! for multi-line assembly. I don't know that that's worth the tradeoff.

Initially I would propose having formatting guidelines, and then doing as much as we can in rustfmt without changing the string literal.

I can absolutely think of real-world reasons someone might write something like this; for instance, embedding some long buffer. I can also think of better ways to do that, but nonetheless, formatting must not break user code. So I don't think we can consider any formatting that changes even the amount of space inside the string. In theory, we might be able to get away with changing the amount of whitespace at the start and end of the string, maybe, but I'm not sure that's worth it, and we'd still have to think very hard to see if anyone could write asm code that that would break.

Amanieu · 2020-06-10T05:02:08Z

I think that the case for formatting multi-line strings is more general and not restricted to inline asm: rust-lang/rustfmt#2876

BartMassey · 2020-06-10T07:19:27Z

I think I would like to see something along the lines of the weird C "adjacent string constants" thing for this case. Then I could put one string per line, which seems to me more convenient. Alternatively, we could just make the macro take an arbitrary number of strings, but I fear much confusion at the end there.

Maybe go nuts and have an implicit newline when the last character is not whitespace and an implicit tab when the first character is not whitespace? The whole

"
xor %rax, %rax\n
\tmov %rax, %rcx\n
.l0:\n
\tinc %rcx\n
"

nonsense I tend to go through with GCC/Clang is pretty annoying.

Speaking of which, what is the assumed whitespace state just before and just after a new asm!() statement?

joshtriplett · 2020-06-10T07:56:38Z

If you want separate strings for each line, you can do that with concat!. That would certainly make formatting easier, at a cost of verbosity and extra indentation.

The assumed whitespace state should indeed be documented. There will be a newline before and after the assembly.

Amanieu · 2020-06-10T08:01:53Z

I don't think that whitespace state before/after the asm is significant since whitespace (except for newlines) is ignored by all assembly languages.

The use of "\t" in C is just because GCC pastes inline assembly directly into the output .S file and some developers want the -S output of the compiler to be nicely aligned when reading the resulting assembly. This does not apply to LLVM since it parses the inline asm and then re-prints the instructions from scratch when generating assembly output.

joshtriplett · 2020-06-10T14:20:17Z

@Amanieu

The use of "\t" in C is just because GCC pastes inline assembly directly into the output .S file and some developers want the -S output of the compiler to be nicely aligned when reading the resulting assembly. This does not apply to LLVM since it parses the inline asm and then re-prints the instructions from scratch when generating assembly output.

Guidance on this point seems worth capturing within the formatting guidelines. I've added a new bullet point to the formatting above:

In assembly strings, focus on the readability and proper indentation of the assembly with respect to the surrounding Rust code. Common assembly formatting such as \n\t (often seen in inline assembly for other compilers that directly copy the provided string into their assembly output) is not necessary with Rust inline assembly.

BartMassey · 2020-06-11T01:43:56Z

#![feature(asm)]

macro_rules! asm_block {
    ({$($line:literal),*$(,)?}$($stuff:tt)*) => {
        asm!(concat!($(concat!($line,"\n")),*)$($stuff)*)
    };
}

fn main() {
    let x: u64;
    unsafe { asm_block!(
        {
            "nop",
            "xor {0},{0}",
        },
        out(reg) x,
    )};
    println!("{}", x);
}

I will admit to being one of those people who would like to see lines tabbed by default in case I want to read the assembly. The macro could be fancied to do this either by providing an explicit no-tab or by inferring: e.g. lines that begin with "." or end with ":" don't get tabbed. The commas in the code block could also get left out, which might or might not be an improvement.

Amanieu · 2020-06-11T05:57:32Z

I really like this syntax and I'm considering supporting it directly in asm!. I would just make a small modification: adjacent literals concatenate directly and ; concatenates with a newline:

fn main() {
    let x: u64;
    unsafe { 
        asm!(
            {
                "nop";
                "xor" " {0}," " {0}";
            },
            out(reg) x,
        );
    }
    println!("{}", x);
}

I've used concat! a lot in the past with llvm_asm! but in my experience this has been rather troublesome:

macro_rules! asm_read {
    ($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
        let tmp: $type;
        llvm_asm! {
            concat!(
                "0: ", $instr, " ${1:", $width, "}, [$2];",
                asm_trap_list!()
            )
            : "+{x16}" ($trapped), "=r" (tmp)
            : "r" ($ptr as u64)
            :
            : "volatile"
        };
        mem::transmute_copy(&tmp)
    }};
}

With the new style it would look like this:

macro_rules! asm_read {
    ($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
        let tmp: $type;
        asm!(
            {
                "0: " $instr " {out:" $width "}, [{ptr}]";
                asm_trap_list!();
            },
            out = lateout(reg) tmp,
            ptr = in(reg) $ptr,
            inout("x16") $trapped,
            options(nostack, preserves_flags),
        );
        mem::transmute_copy(&tmp)
    }};
}

BartMassey · 2020-06-11T06:46:09Z

If you want to allow multiple literals per line, I'd suggest adding a space between each of them internally — I think it's going to be less error-prone.

joshtriplett · 2020-06-12T20:17:23Z

@BartMassey

I will admit to being one of those people who would like to see lines tabbed by default in case I want to read the assembly.

If you mean the assembly output from the compiler, it sounds like it will get properly formatted (by LLVM) even if the input is not.

@Amanieu I really like the idea of a syntax like this as well. I would suggest a slight tweak, though: rather than putting a braced block as the first argument, could we just turn the whole asm! into a braced block rather than a parenthesized one?

asm!{
    "instruction 1",
    "instruction 2",
    "instruction 3",
    out = lateout(reg) var,
    options(nostack),
}

(I don't want to bikeshed the separators between assembly strings, though I will admit that I find a mix of semicolons and commas not ideal. I primarily want to advocate for using a single braced block, because we can easily tell the difference between string literals, options, and inputs/outputs.)

BartMassey · 2020-06-13T04:06:35Z

I guess I could have written the macro to avoid the braced block; I was just being lazy when I put it in, but since those strings are the only arguments of token type literal I guess I didn't need the braces. Ah well.

Amanieu · 2020-06-13T13:37:23Z

Note that macros don't care about the type of delimiter used: [], () and {} can all be used (this is how vec![] works).

I like the idea about allowing multiple string literals in asm! which would be treated as separate lines. I feel that this would make multi-line asm much more readable than multi-line strings.

joshtriplett · 2020-06-13T21:09:31Z

@Amanieu It'll also mean that rustfmt will be able to reliably format inline asm blocks. I'd love to see this.

joshtriplett · 2020-06-15T11:12:06Z

Implemented in rust-lang/rust#73364 ; RFC updated via Amanieu/rfcs#1 .

joshtriplett · 2020-06-15T20:57:10Z

I've updated the formatting guidelines at the top of this issue to use the new support for multiple assembly string arguments.

calebcartwright · 2021-12-08T00:21:46Z

@joshtriplett - is my understanding correct that you all have settled on target formatting as summarized in the updated description, and all that's pending is for someone to codify those rules within the guide? (wondering if we need a new page for special case macros that have their own spec 🤔)

joshtriplett · 2021-12-08T08:03:39Z

I believe we've settled on exactly what's in the description of this issue, yes.

calebcartwright · 2021-12-15T17:24:18Z

cc @ytmimi - know we've got a few things already in flight, but think this would be a good item for you to look at next if you're interested. we can chat more about it offline

To more easily allow rustfmt to format the asm! macro as specified in rust-lang/style-team#152 certain fields are made public.

…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? `@calebcartwright`

…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? ``@calebcartwright``

…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? ```@calebcartwright```

ytmimi · 2022-01-24T15:20:30Z

Hey @joshtriplett, I've spent some time working on getting this implemented in rustfmt (rust-lang/rustfmt#5191). The guidelines you wrote up were a great help. I also found the asm section of the Unstable Book useful!

Templates and options were well specified, but I have a few clarifying questions around the operands and clobber_abis that I hope you can help me with.

Operands

How should we handle const and sym operands?

Is it better to not break between the keyword (const / sym) and the expression for long, unbreakable expressions?

asm!(
    "instruction {}",
    const extremely_long_unbreakable_expression,
    sym extremely_long_unbreakable_symbol_name,
)

In the case that we're dealing with a named argument should we treat const and sym operands as follows?

asm!(
    "instruction {}",
    long_named_argument =
        const extremely_long_unbreakable_expression,
    long_named_argument =
        sym extremely_long_unbreakable_symbol_name,
);

For const specifically, is this the desired formatting for long expressions that can be broken? Given that sym is meant to refer to a function name, we don't need to worry about these cases, right?

asm!(
    "instruction {}",
    const function_name()
        .method_name(method_arg)
        .further_chained_method(),
);

asm!(
    "instruction {}",
    const long_function_name(
        long_function_argument_expression
    ),
);

Here are some other test cases that I wanted to get your opinion on:

asm!(
    "instruction {}",

    // case 1 - in chain out expr
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => very_long_out_expression,

    // case 2 - in function out expr
     inout(reg) long_function_name(
        long_function_argument_expression
    )
        => very_long_out_expression,

    // case 3 - in expr out chain
    inout(reg) very_long_expression
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 4 - in chain out chain
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 5 - in function out chain
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => function_name()
            .method_name(method_arg)
            .further_chained_method(),

    // case 6 - in expr out function
    inout(reg) very_long_expression
        => long_function_name(
            long_function_argument_expression
        ),

    // case 7 - in chain out function
    inout(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => long_function_name(
            long_function_argument_expression
        ),

    // case 8 - in function out function
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => long_function_name(
            long_function_argument_expression
        ),
)

For completeness I added test cases for inout and inlateout that also include named arguments. Just want to double check that something like this is okay.

asm!(
    aaaaaaaaaaaaaaaaaaaa =
        inout(reg) very_long_expression
            => very_long_out_expression,
);

Clobber ABIs

The Reference-level explanation in the Unstable book gives the following ABNF for clobber_abis and options:

clobber_abi := "clobber_abi(" <abi> *["," <abi>] [","] ")"
options := "options(" option *["," option] [","] ")"

Given that the ABNF is so similar should we also treat clobber_abis as we do options? So treat them like nested function calls if we need to break them.

Trailing Commas

Lastly, how should we handle trailing commas? I know that for most macros we don't want to add or remove tokens, but in this case trailing commas are optional. Is there a recommendation for how we should handle trailing commas?

Thanks in advance for your help on this!

ytmimi · 2022-01-29T21:11:47Z

A few more operand related test cases that I wanted to get your opinion on:

In this case we have an inout that doesn't fit on the same line, but the out_expr is _. Should we still force an indented newline after the in expression?

asm!(
    "instruction {}",
    inout(reg) long_function_name(
        long_function_argument_expression
    )
        => _,
);

Or does this look better:

asm!(
    "instruction {}",
    inout(reg) long_function_name(
        long_function_argument_expression
    ) => _,
);

Same question as above, but with chains:

asm!(
    "instruction {}",
    in(reg) function_name()
        .method_name(method_arg)
        .further_chained_method()
        => _,
);

asm!(
    "instruction {}",
    in(reg) function_name()
        .method_name(method_arg)
        .further_chained_method() => _,
);

Do we always want to break inout's if the in_expr and out_expr don't fit on the same line? Here's an example where the two technically don't fit but I'm not sure if it's better to force the break or not:

asm!(
    "instruction {}",
    inout(reg) very_long_expression => function_name(
        long_function_argument_expression
    ),
);

Or do we always force the break?

asm!(
    "instruction {}",
    inout(reg) very_long_expression
        => function_name(
            long_function_argument_expression
        ),
);

Again, thanks for helping to clarify this for me.

calebcartwright · 2022-02-02T01:33:17Z

@ytmimi - I'd suggest we move forward with a PR in this repo that adds text to the guide which covers the above. For the outstanding cases feel free to make your suggested guidance in the respective wording.

Folks can weigh in on that PR, and we can drop notes in a few different channels for broader awareness to allow others to weigh in. We'll also have a little flexibility for a while to adjust things, especially given that we have the benefit in this particular instance of being able to progressively roll out this behavior in rustfmt via a config option.

I think the existing guide text and structure should provide a good reference for framing the resultant rules identified on this thread, but feel free to ping me offline to chat about any specifics

joshtriplett mentioned this issue Jun 10, 2020

Tracking Issue for inline assembly (asm!) rust-lang/rust#72016

Closed

4 tasks

joshtriplett mentioned this issue Jun 15, 2020

inline-asm rust-lang/lang-team#20

Closed

calebcartwright mentioned this issue Jul 1, 2020

Stability Guarantee/Version Strategy rust-lang/rustfmt#4286

Closed

bradjc mentioned this issue Feb 5, 2021

rv32i: transition llvm_asm! to asm! tock/tock#2363

Merged

2 tasks

ytmimi added a commit to ytmimi/rust that referenced this issue Jan 11, 2022

Update AsmArgs field visibility for rustfmt

11bea26

To more easily allow rustfmt to format the asm! macro as specified in rust-lang/style-team#152 certain fields are made public.

ytmimi mentioned this issue Jan 11, 2022

Update AsmArgs field visibility for rustfmt rust-lang/rust#92754

Merged

ytmimi mentioned this issue Feb 1, 2022

Format asm! macro calls rust-lang/rustfmt#5191

Open

traviscross added this to T-style design meeting schedule Sep 28, 2023

traviscross moved this to Scheduled in T-style design meeting schedule Sep 28, 2023

traviscross removed this from T-style design meeting schedule Oct 4, 2023

joshtriplett self-assigned this Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatting guidelines for inline assembly #152

Formatting guidelines for inline assembly #152

joshtriplett commented Jun 10, 2020 •

edited

Loading

Amanieu commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

Amanieu commented Jun 10, 2020

BartMassey commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

Amanieu commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

BartMassey commented Jun 11, 2020 •

edited

Loading

Amanieu commented Jun 11, 2020

BartMassey commented Jun 11, 2020

joshtriplett commented Jun 12, 2020 •

edited

Loading

BartMassey commented Jun 13, 2020

Amanieu commented Jun 13, 2020

joshtriplett commented Jun 13, 2020

joshtriplett commented Jun 15, 2020

joshtriplett commented Jun 15, 2020

calebcartwright commented Dec 8, 2021

joshtriplett commented Dec 8, 2021

calebcartwright commented Dec 15, 2021

ytmimi commented Jan 24, 2022 •

edited

Loading

ytmimi commented Jan 29, 2022

calebcartwright commented Feb 2, 2022

Formatting guidelines for inline assembly #152

Formatting guidelines for inline assembly #152

Comments

joshtriplett commented Jun 10, 2020 • edited Loading

Amanieu commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

Amanieu commented Jun 10, 2020

BartMassey commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

Amanieu commented Jun 10, 2020

joshtriplett commented Jun 10, 2020

BartMassey commented Jun 11, 2020 • edited Loading

Amanieu commented Jun 11, 2020

BartMassey commented Jun 11, 2020

joshtriplett commented Jun 12, 2020 • edited Loading

BartMassey commented Jun 13, 2020

Amanieu commented Jun 13, 2020

joshtriplett commented Jun 13, 2020

joshtriplett commented Jun 15, 2020

joshtriplett commented Jun 15, 2020

calebcartwright commented Dec 8, 2021

joshtriplett commented Dec 8, 2021

calebcartwright commented Dec 15, 2021

ytmimi commented Jan 24, 2022 • edited Loading

Operands

Clobber ABIs

Trailing Commas

ytmimi commented Jan 29, 2022

calebcartwright commented Feb 2, 2022

joshtriplett commented Jun 10, 2020 •

edited

Loading

BartMassey commented Jun 11, 2020 •

edited

Loading

joshtriplett commented Jun 12, 2020 •

edited

Loading

ytmimi commented Jan 24, 2022 •

edited

Loading