-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formatting guidelines for inline assembly #152
Comments
Formatting the string literal is actually very tricking since we technically need to preserve whitespace and newlines when formatting. This will limit how much we can we format the template string, in particular regarding indentation. We could make an exception here since whitespace isn't significant for inline assembly. |
@Amanieu That's not entirely true, if you do something especially unusual in your inline assembly. Assembly supports multi-line #![feature(asm)]
fn main() {
let mut i: u32 = 2;
unsafe {
asm!(
r#"
add {:e}, 3
jmp 1f
.string "hello
world"
1:
"#,
inout(reg) i,
);
}
dbg!(i);
} (Assume for the sake of argument that I used that string for something.) If you look at the disassembly of this, the whitespace is significant. Changing the amount of whitespace at the start of lines would change the semantic meaning of the program. One way around this would be if we recommended using Initially I would propose having formatting guidelines, and then doing as much as we can in I can absolutely think of real-world reasons someone might write something like this; for instance, embedding some long buffer. I can also think of better ways to do that, but nonetheless, formatting must not break user code. So I don't think we can consider any formatting that changes even the amount of space inside the string. In theory, we might be able to get away with changing the amount of whitespace at the start and end of the string, maybe, but I'm not sure that's worth it, and we'd still have to think very hard to see if anyone could write asm code that that would break. |
I think that the case for formatting multi-line strings is more general and not restricted to inline asm: rust-lang/rustfmt#2876 |
I think I would like to see something along the lines of the weird C "adjacent string constants" thing for this case. Then I could put one string per line, which seems to me more convenient. Alternatively, we could just make the macro take an arbitrary number of strings, but I fear much confusion at the end there. Maybe go nuts and have an implicit newline when the last character is not whitespace and an implicit tab when the first character is not whitespace? The whole
nonsense I tend to go through with GCC/Clang is pretty annoying. Speaking of which, what is the assumed whitespace state just before and just after a new |
If you want separate strings for each line, you can do that with The assumed whitespace state should indeed be documented. There will be a newline before and after the assembly. |
I don't think that whitespace state before/after the asm is significant since whitespace (except for newlines) is ignored by all assembly languages. The use of "\t" in C is just because GCC pastes inline assembly directly into the output .S file and some developers want the -S output of the compiler to be nicely aligned when reading the resulting assembly. This does not apply to LLVM since it parses the inline asm and then re-prints the instructions from scratch when generating assembly output. |
Guidance on this point seems worth capturing within the formatting guidelines. I've added a new bullet point to the formatting above:
|
#![feature(asm)]
macro_rules! asm_block {
({$($line:literal),*$(,)?}$($stuff:tt)*) => {
asm!(concat!($(concat!($line,"\n")),*)$($stuff)*)
};
}
fn main() {
let x: u64;
unsafe { asm_block!(
{
"nop",
"xor {0},{0}",
},
out(reg) x,
)};
println!("{}", x);
} I will admit to being one of those people who would like to see lines tabbed by default in case I want to read the assembly. The macro could be fancied to do this either by providing an explicit no-tab or by inferring: e.g. lines that begin with "." or end with ":" don't get tabbed. The commas in the code block could also get left out, which might or might not be an improvement. |
I really like this syntax and I'm considering supporting it directly in fn main() {
let x: u64;
unsafe {
asm!(
{
"nop";
"xor" " {0}," " {0}";
},
out(reg) x,
);
}
println!("{}", x);
} I've used macro_rules! asm_read {
($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
let tmp: $type;
llvm_asm! {
concat!(
"0: ", $instr, " ${1:", $width, "}, [$2];",
asm_trap_list!()
)
: "+{x16}" ($trapped), "=r" (tmp)
: "r" ($ptr as u64)
:
: "volatile"
};
mem::transmute_copy(&tmp)
}};
} With the new style it would look like this: macro_rules! asm_read {
($instr:expr, $width:expr, $trapped:expr, $type:ty, $ptr:expr) => {{
let tmp: $type;
asm!(
{
"0: " $instr " {out:" $width "}, [{ptr}]";
asm_trap_list!();
},
out = lateout(reg) tmp,
ptr = in(reg) $ptr,
inout("x16") $trapped,
options(nostack, preserves_flags),
);
mem::transmute_copy(&tmp)
}};
} |
If you want to allow multiple literals per line, I'd suggest adding a space between each of them internally — I think it's going to be less error-prone. |
If you mean the assembly output from the compiler, it sounds like it will get properly formatted (by LLVM) even if the input is not. @Amanieu I really like the idea of a syntax like this as well. I would suggest a slight tweak, though: rather than putting a braced block as the first argument, could we just turn the whole asm!{
"instruction 1",
"instruction 2",
"instruction 3",
out = lateout(reg) var,
options(nostack),
} (I don't want to bikeshed the separators between assembly strings, though I will admit that I find a mix of semicolons and commas not ideal. I primarily want to advocate for using a single braced block, because we can easily tell the difference between string literals, options, and inputs/outputs.) |
I guess I could have written the macro to avoid the braced block; I was just being lazy when I put it in, but since those strings are the only arguments of token type |
Note that macros don't care about the type of delimiter used: I like the idea about allowing multiple string literals in |
@Amanieu It'll also mean that rustfmt will be able to reliably format inline asm blocks. I'd love to see this. |
Implemented in rust-lang/rust#73364 ; RFC updated via Amanieu/rfcs#1 . |
I've updated the formatting guidelines at the top of this issue to use the new support for multiple assembly string arguments. |
@joshtriplett - is my understanding correct that you all have settled on target formatting as summarized in the updated description, and all that's pending is for someone to codify those rules within the guide? (wondering if we need a new page for special case macros that have their own spec 🤔) |
I believe we've settled on exactly what's in the description of this issue, yes. |
cc @ytmimi - know we've got a few things already in flight, but think this would be a good item for you to look at next if you're interested. we can chat more about it offline |
To more easily allow rustfmt to format the asm! macro as specified in rust-lang/style-team#152 certain fields are made public.
…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? `@calebcartwright`
…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? ``@calebcartwright``
…calebcartwright Update AsmArgs field visibility for rustfmt To more easily allow rustfmt to format the ``asm!`` macro as specified in rust-lang/style-team#152 certain fields are made public. r? ```@calebcartwright```
Hey @joshtriplett, I've spent some time working on getting this implemented in rustfmt (rust-lang/rustfmt#5191). The guidelines you wrote up were a great help. I also found the asm section of the Unstable Book useful! Templates and options were well specified, but I have a few clarifying questions around the OperandsHow should we handle Is it better to not break between the keyword (const / sym) and the expression for long, unbreakable expressions? asm!(
"instruction {}",
const extremely_long_unbreakable_expression,
sym extremely_long_unbreakable_symbol_name,
) In the case that we're dealing with a named argument should we treat const and sym operands as follows? asm!(
"instruction {}",
long_named_argument =
const extremely_long_unbreakable_expression,
long_named_argument =
sym extremely_long_unbreakable_symbol_name,
); For const specifically, is this the desired formatting for long expressions that can be broken? Given that sym is meant to refer to a function name, we don't need to worry about these cases, right? asm!(
"instruction {}",
const function_name()
.method_name(method_arg)
.further_chained_method(),
);
asm!(
"instruction {}",
const long_function_name(
long_function_argument_expression
),
); Here are some other test cases that I wanted to get your opinion on: asm!(
"instruction {}",
// case 1 - in chain out expr
inout(reg) function_name()
.method_name(method_arg)
.further_chained_method()
=> very_long_out_expression,
// case 2 - in function out expr
inout(reg) long_function_name(
long_function_argument_expression
)
=> very_long_out_expression,
// case 3 - in expr out chain
inout(reg) very_long_expression
=> function_name()
.method_name(method_arg)
.further_chained_method(),
// case 4 - in chain out chain
inout(reg) function_name()
.method_name(method_arg)
.further_chained_method()
=> function_name()
.method_name(method_arg)
.further_chained_method(),
// case 5 - in function out chain
inout(reg) long_function_name(
long_function_argument_expression
)
=> function_name()
.method_name(method_arg)
.further_chained_method(),
// case 6 - in expr out function
inout(reg) very_long_expression
=> long_function_name(
long_function_argument_expression
),
// case 7 - in chain out function
inout(reg) function_name()
.method_name(method_arg)
.further_chained_method()
=> long_function_name(
long_function_argument_expression
),
// case 8 - in function out function
inout(reg) long_function_name(
long_function_argument_expression
)
=> long_function_name(
long_function_argument_expression
),
) For completeness I added test cases for asm!(
aaaaaaaaaaaaaaaaaaaa =
inout(reg) very_long_expression
=> very_long_out_expression,
); Clobber ABIsThe Reference-level explanation in the Unstable book gives the following ABNF for
Given that the ABNF is so similar should we also treat clobber_abis as we do options? So treat them like nested function calls if we need to break them. Trailing CommasLastly, how should we handle trailing commas? I know that for most macros we don't want to add or remove tokens, but in this case trailing commas are optional. Is there a recommendation for how we should handle trailing commas? Thanks in advance for your help on this! |
A few more operand related test cases that I wanted to get your opinion on: In this case we have an asm!(
"instruction {}",
inout(reg) long_function_name(
long_function_argument_expression
)
=> _,
); Or does this look better: asm!(
"instruction {}",
inout(reg) long_function_name(
long_function_argument_expression
) => _,
); Same question as above, but with chains: asm!(
"instruction {}",
in(reg) function_name()
.method_name(method_arg)
.further_chained_method()
=> _,
); asm!(
"instruction {}",
in(reg) function_name()
.method_name(method_arg)
.further_chained_method() => _,
); Do we always want to break inout's if the in_expr and out_expr don't fit on the same line? Here's an example where the two technically don't fit but I'm not sure if it's better to force the break or not: asm!(
"instruction {}",
inout(reg) very_long_expression => function_name(
long_function_argument_expression
),
); Or do we always force the break? asm!(
"instruction {}",
inout(reg) very_long_expression
=> function_name(
long_function_argument_expression
),
); Again, thanks for helping to clarify this for me. |
@ytmimi - I'd suggest we move forward with a PR in this repo that adds text to the guide which covers the above. For the outstanding cases feel free to make your suggested guidance in the respective wording. Folks can weigh in on that PR, and we can drop notes in a few different channels for broader awareness to allow others to weigh in. We'll also have a little flexibility for a while to adjust things, especially given that we have the benefit in this particular instance of being able to progressively roll out this behavior in rustfmt via a config option. I think the existing guide text and structure should provide a good reference for framing the resultant rules identified on this thread, but feel free to ping me offline to chat about any specifics |
An issue I've observed in several contexts discussing the new
asm!
syntax for inline assembly: everyone formatsasm!
statements differently, and we should 1) come up with guidance on how to do so, and 2) implement that guidance inrustfmt
.Notably, this includes how to format both single-line and multi-line assembly statements.
EDIT: I've updated these guidelines to use the new support for multiple template string arguments, implemented in rust-lang/rust#73364 .
With my style team hat on, I would propose the following guidelines:
\n\t
(often seen in inline assembly for other compilers that directly copy the provided string into their assembly output) is not necessary with Rust inline assembly. Focus on keeping the assembly readable. Note that Rust is not prescriptive about the formatting of your assembly, so if you wish to put multiple instructions or directives or labels on one line (and thus within one assembly string), you can do so, and rustfmt will treat each assembly string as a line."
of each string as the base for any indentation within the assembly code; for instance, if you want to indent instructions four spaces past labels, include the indentation inside the string before the instructions. That way, Rust formatting can keep the strings aligned and your assembly code will remain aligned within them:asm!
with only one assembly string can have the entireasm!
including all its arguments on the same line, if the whole thing fromasm!(
to);
(plus indentation) fits in the line width.asm!
block that needs breaking across multiple lines, or anyasm!
block that has multiple assembly strings no matter how short, should always put each argument on its own line, aligned one indentation level past theasm!
, with a trailing comma on each, just like a function.options(...)
goes on one line if it fits; otherwise, format it as though it were a nested function call to a function namedoptions
.in(reg)
orout(reg)
orinout(reg)
orin("regname")
orout("regname")
or similar; always treat them as a single atomic unit.inout
orinlateout
pair of expressions are too long to fit on one line, break before (never after) the=>
, and indent the=>
one level further:in(reg)
orout(reg)
orlateout(reg)
expression is too long, break between the)
and the expression and indent one level, then format from there; however, if the expression can be broken internally, follow same the rules for when to leave the head of the expression on the same line as thein(reg)
or similar as ifin(reg)
were the opener of a function:name = in(reg) expression
, line-break it as you would an assignment statement with the same indentation, treating thein(reg) expression
as the right-hand side of the assignment, and following all the same rules as above.The text was updated successfully, but these errors were encountered: