Measuring `LineSuffix` when formatting to include trailing comments #5630

cnpryer · 2023-07-09T16:32:56Z

In #5169 I'm playing with test cases and noticed tuples with trailing comments exceeding line-length don't get formatted.

I added this to the ruff fixtures for tuple formatting:

# Trailing comment should force format
i1 = ("aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd")  # Trailing

With black you'll get:

# Trailing comment should force format
i1 = (
    "aasdsdasd",
    "aasdsdasd",
    "aasdsdasd",
    "aasdsdasd",
    "aasdsdasd",
    "aasdsdasd",
)  # Trailing

Here's the modified snapshot results:

Snapshot file: crates/ruff_python_formatter/tests/snapshots/format@expression__tuple.py.snap
Snapshot: format@expression/tuple.py
Source: crates/ruff_python_formatter/tests/fixtures.rs:151
Input file: crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/tuple.py
────────────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────────────
  258   258 │     "qweiurpoiqwurepqiurpqirpuqoiwrupqoirupqoirupqoiurpqiorupwqiourpqurpqurpqurpqurpqurpqurüqurqpuriq",
  259   259 │ )
  260   260 │ 
  261   261 │ # Trailing comment should force format
  262       │-i1 = (
  263       │-    "aasdsdasd",
  264       │-    "aasdsdasd",
  265       │-    "aasdsdasd",
  266       │-    "aasdsdasd",
  267       │-    "aasdsdasd",
  268       │-    "aasdsdasd",
  269       │-)  # Trailing
        262 │+i1 = ("aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd", "aasdsdasd")  # Trailing
  270   263 │ ```

If I can scoop this up in #5169 I'll submit a separate PR, but I figured I could start by documenting this here.

UPDATE: Looking into it more this looks like a general trailing comment issue rather than specifically a tuple issue. You can repro with dict using:

d2 = {"a": 1000, "b": 1000, "c": 1000, "d": 1000, "e": 1000, "f": 1000, "g": 1000}  # Trailing

The text was updated successfully, but these errors were encountered:

Related: astral-sh#5630

MichaReiser · 2023-07-09T18:14:50Z

Yes, this is a generic problem applying to all trailing comments. The reason is that the Printer doesn't measure the content of a LineSuffix. This is mainly to keep the implementation simpler because it would otherwise be necessary to not only track the current position (needed for source maps), but also the line width with trailing comments included.

I don't mind changing the implementation, depending on its performance impact. But I think that this can also be a reasonable deviation. The relevant line where we skip over line suffixes is here:

ruff/crates/ruff_formatter/src/printer/mod.rs

Lines 218 to 220 in 56bae34

    
           self.state 
        
               .line_suffixes 
        
               .extend(args, queue.iter_content(TagKind::LineSuffix));

I would probably add a new line_suffix_width state-variable and compute the width of the content on the above-mentioned lines. We can then create a new width() method on the Printer that returns the width (current line position + line suffix width).

The main challenge is how to compute the width without actually printing the content. I guess, we could call into fits and then take the width from there but that feels like a hack.

charliermarsh · 2023-08-06T16:30:43Z

Does it simplify the problem at all if we constrain the kinds of nodes that can be printed as line suffixes? (Do we ever need line suffix apart from for comments?)

charliermarsh · 2023-08-06T16:35:56Z

I can answer those questions myself. The bigger question is probably whether we want to deviate from Black here or fix this.

cnpryer · 2023-08-06T17:09:16Z

The bigger question is probably whether we want to deviate from Black here or fix this.

I think linting influences some of my formatting decisions. So if my understanding is that line-length violations include trailing comments as part of the line suffix calculation, then naturally I'll probably expect the formatting to align with that constraint.

In other words, I currently lean towards how black handles it.

MichaReiser · 2023-08-07T06:47:48Z

I think linting influences some of my formatting decisions. So if my understanding is that line-length violations include trailing comments as part of the line suffix calculation, then naturally I'll probably expect the formatting to align with that constraint.

That makes sense to me. But this raises the question of whether the formatting must be consistent with all lint rules Ruff supports. I don't think this is a desired property of the formatter because it would limit our options when choosing a formatting, because we could only design for the least common denominator.

I'm leaning towards automatically disabling rules that we know conflict with the formatter (you shouldn't need formatting-related rules if you use our formatter).

Does it simplify the problem at all if we constrain the kinds of nodes that can be printed as line suffixes? (Do we ever need line suffix apart from for comments?)

The question is, what are comments ;) We need to support text and hard line breaks at least, possibly labels too. I don't see this as a high risk issue. I'm confident that we can support it and would prefer to delay working on it until we're further along with the formatter. However, we'll have to be extra careful to:

ideally: Only pay the additional overhead for line suffixes (don't introduce a new code path that must be executed for each FormatElement)
The change is in line with the semantics of all other IR elements (My main concern is MeasureMode::AllLines

MichaReiser · 2023-08-17T10:21:49Z

Related black issues

Prettier issue:

Inline comments don't count towards line length prettier/prettier#4754

MichaReiser · 2023-08-17T13:26:49Z

Regarding the consistency with the lint rule issue. I would recommend disabling the rule if someone uses our formatter, similar to what Prettier's eslint plugin does. There are cases where it is known that Black doesn't break your line (it may even not be allowed to break in these positions) and requiring a suppression comment in that case only makes things worse (to add on top of that, the formatter might move your comment, because the line is too long 😆)

MichaReiser · 2023-08-17T14:36:47Z

One good counter example of why Ruff should respect the line width that @zanieb made:

if (
    first_test() 
    and not settings.TEST_SUITE
):  # nocoverage: We can't verify the signature in test suite since we fetch the events
    pass

If I deliberately formatted the comment on its own line for better readability, then the formatter shouldn't collapse the line and make the comment exceed the line width.

MichaReiser · 2023-08-17T14:39:21Z

One "simple" implementation idea: Add a reserved_width field to LineSuffix and add that width to printer.state.line_width. This means its up to the line suffix user to decide whether the line suffix should count to the line width or not. The downside to this is that the client code needs to count the characters and use options.tab_width() (doesn't exist today) for the tab space.

@cnpryer would you be interested in implementing this?

MichaReiser · 2023-08-17T15:20:37Z

I promise I'll stop my spam after this.

After discussing this internally. We want to align with black because we believe that respecting the line width improves the readability of code, this includes comments.

There are examples of where Black/Ruff split the line weirdly if it has a long trailing comments. Comments may more likely trigger these problems but aren't limited to them. Let's take the subscript example:

a[0] # a very long comment that exceeds the line width and makes the subscript split eventually

# Black
a[
    0
]  # a very long comment that exceeds the line width and makes the subscript split eventually

This is worse formatting, in my view, but it isn't specific to comments:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa[4] 

# Ruff
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa[
    4
]

It's less likely that you trigger the second case, but the proper solution IMO is to never break subscripts if indexing by a number literal. This solves the odd formatting in both cases.

cnpryer · 2023-08-17T16:25:46Z

@cnpryer would you be interested in implementing this?

I would, but I want to spend some time today looking at this again (and thinking about it) before I claim it. I've got several things I'm trying to juggle right now, so my main concern is that causing any delay on the formatter's progress.

Off the bat I'm struggling to wrap my head around the client code impact (needing options.tab_width()), but I'm sure it'll make more sense after digging around.

After discussing this internally. We want to align with black because we believe that respecting the line width improves the readability of code, this includes comments.

Are you thinking about tossing this in the Alpha bucket? Or would it be Beta territory?

cnpryer · 2023-08-17T16:27:16Z

Ah I see it's in Alpha tracked by #6069

MichaReiser · 2023-08-17T16:30:15Z

Are you thinking about tossing this in the Alpha bucket? Or would it be Beta territory?

Don't feel pressured by it. It's ready when it's ready.

Off the bat I'm struggling to wrap my head around the client code impact (needing options.tab_width()), but I'm sure it'll make more sense after digging around.

The client code will need to compute the width of a comment. This means we'll have to duplicate some (or all of)

ruff/crates/ruff_formatter/src/printer/mod.rs

Lines 1278 to 1298 in 910dbbd

    
           for c in text.chars() { 
        
               let char_width = match c { 
        
                   '\t' => u32::from(self.options().tab_width), 
        
                   '\n' => { 
        
                       if self.must_be_flat { 
        
                           return Fits::No; 
        
                       } 
        
                       match args.measure_mode() { 
        
                           MeasureMode::FirstLine => return Fits::Yes, 
        
                           MeasureMode::AllLines => { 
        
                               self.state.line_width = 0; 
        
                               continue; 
        
                           } 
        
                       } 
        
                   } 
        
                   // SAFETY: A u32 is sufficient to format files <= 4GB 
        
                   #[allow(clippy::cast_possible_truncation)] 
        
                   c => c.width().unwrap_or(0) as u32, 
        
               }; 
        
               self.state.line_width += char_width; 
        
           }

Specifically, the width of a tab is configurable (is a tab 2 or 4 spaces)?.

I would, but I want to spend some time today looking at this again (and thinking about it) before I claim it. I've got several things I'm trying to juggle right now, so my main concern is that causing any delay on the formatter's progress.

Totally up to you. I tagged you because you showed interest in working on the Printer, but I also don't want to pressure you.

MichaReiser · 2023-08-17T18:48:40Z

Okay, I lied... I have one other observation to share. Pyink excludes trailing pragma comments from the computed width. What's nice about the IR change we've been discussing is that we could support this as well, by simply setting the width to 0 if it is a pragma comment.

cnpryer · 2023-08-18T11:55:54Z

Started looking into it this morning. I'd say leave this as unassigned in case someone wants to leapfrog me. At some point I'll have more time, and when that happens I'd be more comfortable claiming issues.

davidszotten · 2023-08-31T07:40:27Z

i was expecting this to also fix (example from django)

(-ruff +black)

-UNUSABLE_PASSWORD_SUFFIX_LENGTH = 40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+UNUSABLE_PASSWORD_SUFFIX_LENGTH = (
+    40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+)

did i misunderstand something?

MichaReiser · 2023-08-31T07:45:55Z

i was expecting this to also fix (example from django)

(-ruff +black)

-UNUSABLE_PASSWORD_SUFFIX_LENGTH = 40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+UNUSABLE_PASSWORD_SUFFIX_LENGTH = (
+    40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+)

did i misunderstand something?

Yes and no. This is related to #6872 The current implementation uses an heuristic of when to use this layout because it is expensive.

ruff/crates/ruff_python_formatter/src/expression/parentheses.rs

Lines 38 to 50 in fc89976

    
           // Only use best fits if: 
        
           // * The text is longer than 5 characters: 
        
           //   This is to align the behavior with `True` and `False`, that don't use best fits and are 5 characters long. 
        
           //   It allows to avoid [`OptionalParentheses::BestFit`] for most numbers and common identifiers like `self`. 
        
           //   The downside is that it can result in short values not being parenthesized if they exceed the line width. 
        
           //   This is considered an edge case not worth the performance penalty and IMO, breaking an identifier 
        
           //   of 5 characters to avoid it exceeding the line width by 1 reduces the readability. 
        
           // * The text is know to never fit: The text can never fit even when parenthesizing if it is longer 
        
           //   than the configured line width (minus indent). 
        
           text_len > 5 
        
               && text_len 
        
                   <= context.options().line_width().value() as usize 
        
                       - context.options().indent_width() as usize

We could extend the heuristic to take trailing end of line comments into consideration (use it if the node has any trailing end of line comments).

davidszotten · 2023-08-31T12:49:10Z

thanks for the explanation. your suggestion seems worth a try so started having a look. however, e.g. in the example above we call should_use_best_fit with the ExprConstant but the comments is attached to the StmtAssign. is there some nice way to find the comment from the constant?

MichaReiser · 2023-08-31T12:56:01Z

Hmm good point. You get the parent in NeedsParentheses, but not sure if that will work reliably.

davidszotten · 2023-08-31T13:22:45Z

hm. parent improves the django example but doesn't quite fix it

 UNUSABLE_PASSWORD_SUFFIX_LENGTH = (
-    40
-)  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+    40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
+)

we now end up moving the comment

MichaReiser · 2023-08-31T13:36:59Z

Do I understand correctly that it now gets moved inside of the parentheses? That's rather unexpected.

davidszotten · 2023-08-31T14:24:30Z

sorry for being unclear. it's the other way around

input:

UNUSABLE_PASSWORD_SUFFIX_LENGTH = 40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX

black

UNUSABLE_PASSWORD_SUFFIX_LENGTH = (
    40  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX
)

ruff:

UNUSABLE_PASSWORD_SUFFIX_LENGTH = (
    40
)  # number of random chars to add after UNUSABLE_PASSWORD_PREFIX

oh, and starting with the black version as input, my ruff impl is unstable :(

davidszotten · 2023-08-31T14:37:45Z

made a pr so we have code to discuss and a better place for this discussion #7023

cnpryer added a commit to cnpryer/ruff that referenced this issue Jul 9, 2023

Update snapshot for trailing comment test case

14b80b6

Related: astral-sh#5630

cnpryer mentioned this issue Jul 9, 2023

Format delete statement #5169

Merged

4 tasks

MichaReiser added the formatter Related to the formatter label Jul 9, 2023

cnpryer changed the title ~~Formatter fails to format tuples with trailing comments exceeding line-length~~ Measuring LineSuffix when formatting to include trailing comments Jul 16, 2023

cnpryer mentioned this issue Jul 17, 2023

📋 Formatter black compatibility tracking issue #5828

Closed

MichaReiser mentioned this issue Aug 6, 2023

End-of-line comments are ignored when breaking long lines in the formatter #6377

Closed

konstin mentioned this issue Aug 11, 2023

📋 Black-compatible formatting of django #6069

Closed

20 tasks

MichaReiser added this to the Formatter: Beta milestone Aug 22, 2023

MichaReiser mentioned this issue Aug 22, 2023

Formatter: Pragma comments #6197

Closed

3 tasks

MichaReiser modified the milestones: Formatter: Beta, Formatter: Alpha Aug 22, 2023

This comment was marked as outdated.

Sign in to view

This was referenced Aug 23, 2023

Add LineSuffix reserved width #6830

Merged

Include trailing end-of-line comments in the measured line width #6771

Closed

Use reserved width to include line suffix measurement #6901

Merged

MichaReiser closed this as completed in #6830 Aug 28, 2023

davidszotten mentioned this issue Aug 31, 2023

`should_use_best_fit with comments #7023

Closed

kdeldycke mentioned this issue Sep 15, 2023

Formatter: wrap comments (E501) #7414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring `LineSuffix` when formatting to include trailing comments #5630

Measuring `LineSuffix` when formatting to include trailing comments #5630

cnpryer commented Jul 9, 2023 •

edited

Loading

MichaReiser commented Jul 9, 2023

charliermarsh commented Aug 6, 2023

charliermarsh commented Aug 6, 2023

cnpryer commented Aug 6, 2023

MichaReiser commented Aug 7, 2023

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

cnpryer commented Aug 17, 2023

cnpryer commented Aug 17, 2023

MichaReiser commented Aug 17, 2023

MichaReiser commented Aug 17, 2023

cnpryer commented Aug 18, 2023

This comment was marked as outdated.

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

davidszotten commented Aug 31, 2023

Measuring LineSuffix when formatting to include trailing comments #5630

Measuring LineSuffix when formatting to include trailing comments #5630

Comments

cnpryer commented Jul 9, 2023 • edited Loading

MichaReiser commented Jul 9, 2023

charliermarsh commented Aug 6, 2023

charliermarsh commented Aug 6, 2023

cnpryer commented Aug 6, 2023

MichaReiser commented Aug 7, 2023

MichaReiser commented Aug 17, 2023 • edited Loading

MichaReiser commented Aug 17, 2023 • edited Loading

MichaReiser commented Aug 17, 2023 • edited Loading

MichaReiser commented Aug 17, 2023 • edited Loading

MichaReiser commented Aug 17, 2023 • edited Loading

cnpryer commented Aug 17, 2023

cnpryer commented Aug 17, 2023

MichaReiser commented Aug 17, 2023

MichaReiser commented Aug 17, 2023

cnpryer commented Aug 18, 2023

This comment was marked as outdated.

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

MichaReiser commented Aug 31, 2023

davidszotten commented Aug 31, 2023

davidszotten commented Aug 31, 2023

Measuring `LineSuffix` when formatting to include trailing comments #5630

Measuring `LineSuffix` when formatting to include trailing comments #5630

cnpryer commented Jul 9, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading

MichaReiser commented Aug 17, 2023 •

edited

Loading