Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pattern-based formatter for rust-code on top of rust-analyzer #1665

Open
matklad opened this issue Aug 8, 2019 · 10 comments
Open
Labels
A-formatting formatting of r-a output/formatting on save E-hard E-has-instructions Issue has some instructions and pointers to code to get started fun A technically challenging issue with high impact good first issue S-actionable Someone could pick this issue up and work on it right now

Comments

@matklad
Copy link
Member

matklad commented Aug 8, 2019

Goal

We need to implement Rust code formatter on top of ra_syntax's concrete syntax tree.
Long-term, I am almost sure that we should just port rustfmt, with two modifications:

  • it should be able to format an isolated syntax node (as opposed to formatting the whole module)
  • it should be able to format code even in presence of sever syntax errors

However, I think it would be valuable to experiment a bit short-medium term!

There are two kinds of code formatters out there:

  • pretty printers, which more or less completely rewrite the code, disregarding existing white space completely.
  • pattern-based formatters, which fix whitespace between tokens and indentation, but doesn't generally introduce or remove newlines.

Rustfmt, JavaScript's prettier and Python's black are examples of the first breed.
Gofmt and IntelliJ formatter are examples of the second kind.

At https://users.rust-lang.org/t/how-are-you-using-rustfmt-and-clippy/31082/20?u=matklad, some folks suggested that they prefer gofmt style to rustfmt.
So I think it makes sense to just implement that on top of rust-analyzer, as an experiment.
To be clear, the goal is not to implement a different code style for rustfmt: code, already formatted by rustfmt, should not be changed by ra-fmt.
However, ra-fmt should be more flexible about line breaks, allowing to layout code, manually, in a more readable way than rustfmt's style.

Actual Task

So, how does one actually implements a rule-based formatter?

The core idea is to implement formatting as a set of rules like "=> token should be surrounded by exactly one whitespace inside match arm", than walk through each syntax node and apply the matching rules to each node.

So, formatter is typically split into DLS for defining rules and engine for applying them.

Examples

For example, here's the set of rules for Kotlin:

https://github.com/JetBrains/kotlin/tree/7d173ed3856e429739b55d8c3892e1b85ca41571/idea/formatter/src/org/jetbrains/kotlin/idea/formatter

Here's the DLS definition form IntelliJ:

https://github.com/JetBrains/intellij-community/tree/5c3f54c486856d513c33088a76c7805d312d7ae7/platform/lang-api/src/com/intellij/formatting

And here's the IntelliJ's engine:

https://github.com/JetBrains/intellij-community/tree/5c3f54c486856d513c33088a76c7805d312d7ae7/platform/lang-impl/src/com/intellij/formatting/engine

In a smaller scale, all three parts are implemented in nixpkgs-fmt:

https://github.com/nix-community/nixpkgs-fmt

And of course there's a set of rules that IntelliJ Rust is using:

https://github.com/intellij-rust/intellij-rust/tree/ccf1c54db8e77db90764e32651c8503564a05bc0/src/main/kotlin/org/rust/ide/formatter

Fmt Model

While one can apply formatting operations directly to the syntax tree, I think it's a good idea to introduce an intermediate formatting model tree.
The benefits are:

  • whitespace gets an abstract representation
  • empty whitespace is explicitelly represented
  • rust-specific: the fmt tree can be made single-threaded an mutable
  • sometimes it's useful to make the fmt tree of slightly a different shape than the syntax tree

A good place to look at would be this line:

https://github.com/intellij-rust/intellij-rust/blob/ccf1c54db8e77db90764e32651c8503564a05bc0/src/main/kotlin/org/rust/ide/formatter/RsFormattingModelBuilder.kt#L27

Uncomment it to see how the actual model looks like

for

fn main() {
    frobnicate()
        .foo().bar()
}

I get

  <block FILE 0:51 <Indent: NONE>>
    <block FUNCTION 0:51 <Indent: NONE>>
      "fn" 0:2 <Indent: NONE>
      spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
      "main" 3:7 <Indent: NONE>
      spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
      <block VALUE_PARAMETER_LIST 7:9 <Indent: NONE>>
        "(" 7:8 <Indent: NONE>
        spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
        ")" 8:9 <Indent: NONE>
      spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
      <block BLOCK 10:51 <Indent: NONE>>
        "{" 10:11 <Indent: NONE>
        spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
        <block DOT_EXPR 16:49 <Indent: NORMAL>>
          <block DOT_EXPR 16:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
            <block CALL_EXPR 16:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
              <block PATH_EXPR 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
                <block PATH 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
                  "frobnicate" 16:26 <Indent: NONE>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block VALUE_ARGUMENT_LIST 26:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
                "(" 26:27 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                ")" 27:28 <Indent: NONE>
            spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
            <block . 37:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
              "." 37:38 <Indent: CONTINUATION_WITHOUT_FIRST>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block METHOD_CALL 38:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
                "foo" 38:41 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                <block VALUE_ARGUMENT_LIST 41:43 <Indent: NONE>>
                  "(" 41:42 <Indent: NONE>
                  spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                  ")" 42:43 <Indent: NONE>
          spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
          <block . 43:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
            "." 43:44 <Indent: CONTINUATION_WITHOUT_FIRST>
            spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
            <block METHOD_CALL 44:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
              "bar" 44:47 <Indent: NONE>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block VALUE_ARGUMENT_LIST 47:49 <Indent: NONE>>
                "(" 47:48 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                ")" 48:49 <Indent: NONE>
        spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
        "}" 50:51 <Indent: NONE>

Note how's stuff after . is split in it's own block!

The model in nixpkgs-fmt uses explitic whitespace, but uses existing syntax nodes for blocks.
I feel like for rust we should also add fmt-blocks, precisely to deal with chained calls problem.

Passes

After fmt model is build, you run severl passes over it, which fix particula aspects of formatting.

  • first, spacing between the tokens is enforced
  • second, indentation is fixed
  • optionally, an alighment pass adds whitespace such that stuff on different lines aligns (we probably dont' need this for Rust)
  • finally, we render the results of formatting as either of:
    • fully formatted string
    • a diff of some form
    • fully formatted syntax tree

Impl Plan

The rough plan is:

  • build the testing harness for this. Take a look at inline tests in nixpkgs-fmt, I find them very valuable
  • build formatting model
  • build spacing DSL, pass that enforces spacing and a couple of simple spacing rules
  • build indentation DLS, pass that fixes indentation and a copule of simple indentation rules
  • add rules for the full rust language

This should live in ra_fmt crate and use syntax tree from ra_syntax crate.

@matklad matklad added E-hard E-has-instructions Issue has some instructions and pointers to code to get started fun A technically challenging issue with high impact good first issue labels Aug 8, 2019
@matklad
Copy link
Member Author

matklad commented Aug 10, 2019

Some more links from IntelliJ:

Block describes a non-whitespace bit of a syntax tree in the formatting model: https://github.com/JetBrains/intellij-community/blob/c722fdd3333a4fdb9657feede3fb14849d6868a8/platform/lang-api/src/com/intellij/formatting/Block.java

Spacing describes constraints on the space between the nodes: https://github.com/JetBrains/intellij-community/blob/c722fdd3333a4fdb9657feede3fb14849d6868a8/platform/lang-api/src/com/intellij/formatting/Spacing.java

Indent describes how the block should be indented: https://github.com/JetBrains/intellij-community/blob/c722fdd3333a4fdb9657feede3fb14849d6868a8/platform/lang-api/src/com/intellij/formatting/Indent.java

Note how all these bits are reflected in the printout in the comment above.

Finally, FormattingModel provide interface for changing stuff:

https://github.com/JetBrains/intellij-community/blob/c722fdd3333a4fdb9657feede3fb14849d6868a8/platform/lang-api/src/com/intellij/formatting/FormattingModel.java

DevinR528 added a commit to DevinR528/rust-analyzer that referenced this issue Aug 11, 2019
@matklad
Copy link
Member Author

matklad commented Aug 11, 2019

cc #1678

@matklad
Copy link
Member Author

matklad commented Aug 11, 2019

More links:

WhiteSpace is an object-level mutable representation of ws: https://github.com/JetBrains/intellij-community/blob/cae745952ee47fe305403dd9b23beceee5f2ad21/platform/lang-impl/src/com/intellij/formatting/WhiteSpace.java

AbstractBlockWrapper is a mutable counterpart of Block, which also contains preceding space: https://github.com/JetBrains/intellij-community/blob/cae745952ee47fe305403dd9b23beceee5f2ad21/platform/lang-impl/src/com/intellij/formatting/AbstractBlockWrapper.java

I think allowing only for preceding whitespace makes sense, it should make things simpler

@Keats
Copy link

Keats commented Aug 12, 2019

To be clear, the goal is not to implement a different code style for rustfmt: code, already formatted by rustfmt, should not be changed by ra-fmt.

Considering the number of options of rustfmt, this doesn't really seem easily doable?

Would it make sense to start with a simpler gofmt style formatter to start with and have the rustfmt compatible version be another project?

@matklad
Copy link
Member Author

matklad commented Aug 12, 2019 via email

DevinR528 added a commit to DevinR528/rust-analyzer that referenced this issue Aug 23, 2019
DevinR528 added a commit to DevinR528/rust-analyzer that referenced this issue Jan 11, 2020
DevinR528 added a commit to DevinR528/rust-analyzer that referenced this issue Jan 11, 2020
@matklad
Copy link
Member Author

matklad commented May 15, 2020

I've rencently went back to working on nixpkgs-fmt for a bit, and gained another useful insight:

it is helpful to split formatting into two phases:

  • whitespace fixup, which cleans up spaces between tokens and may introduce or remove \n
  • reindent phase, which only adjusts ammount of spaces/tabs at the start of the line

in nixpkgs-fmt, we tried to handle both at once. They were two separate phases, but they mutated the single data structure.

Now I think it makes sense to fully reify syntax tree between the two phases (and that's what we've switched to in nixpkgs-fmt).

And if we do that, it makes sense to start with reindent phase first, as that is simpler, is more useful for refactorings, and is required to implement correct cursor placement on Enter.

cc #4182

@zicklag
Copy link

zicklag commented Jun 4, 2021

Would ra-fmt be standalone, in that it could function without running a Rust Analyzer server and could be used to simply take unformatted string input → formatted string output? And do you think it could compile to WASM?

I've been entertaining the thought of an alternative rust formater that could be used more easily as a library and integrated into things like dprint as a WASM plugin.

@flodiebold
Copy link
Member

Yes, and yes.

@bjorn3
Copy link
Member

bjorn3 commented Jun 4, 2021

And do you think it could compile to WASM?

If all you care about is WASM, rustc itself can compile to wasm32-wasi with a couple of minor changes: rust-lang/miri#722 (comment) This means that rustfmt probably can be compiled to wasm32-wasi too.

@matklad
Copy link
Member Author

matklad commented Oct 27, 2021

if/when we do this, we totally should call the thing rustfmtizer

@DropDemBits DropDemBits added the A-formatting formatting of r-a output/formatting on save label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-formatting formatting of r-a output/formatting on save E-hard E-has-instructions Issue has some instructions and pointers to code to get started fun A technically challenging issue with high impact good first issue S-actionable Someone could pick this issue up and work on it right now
Projects
None yet
Development

No branches or pull requests

7 participants