Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer expanding parenthesized expressions before operands #5608

Merged
merged 10 commits into from
Jul 11, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 8, 2023

Summary

This PR implements Black's behavior where it first splits off parenthesized expressions before splitting before operands to avoid unnecessary parentheses:

# We want 
if a + [ 
	b,
	c
]: 
	pass

# Rather than
if (
    a
    + [b, c]
): 
	pass

This is implemented by using the new IR elements introduced in #5596.

  • We give the group wrapping the optional parentheses an ID (parentheses_id)
  • We use conditional_group for the lower priority groups (all non-parenthesized expressions) with the condition that the parentheses_id group breaks (we want to split before operands only if the parentheses are necessary)
  • We use fits_expanded to wrap all other parenthesized expressions (lists, dicts, sets), to prevent that expanding e.g. a list expands the parentheses_id group. We gate the fits_expand to only apply if the parentheses_id group fits (because we prefer a\n+[b, c] over expanding [b, c] if the whole expression gets parenthesized).

We limit using fits_expanded and conditional_group only to expressions that themselves are not in parentheses (checking the conditions isn't free)

Test Plan

It increases the Jaccard index for Django from 0.915 to 0.917

Incompatibilites

There are two incompatibilities left that I'm aware of (there may be more, I didn't go through all snapshot differences).

Long string literals

I commented on the regression. The issue is that a very long string (or any content without a split point) may not fit when only breaking the right side. The formatter than inserts the optional parentheses. But this is kind of useless because the overlong string will still not fit, because there are no new split points.

I think we should ignore this incompatibility for now

Expressions on statement level

I don't fully understand the logic behind this yet, but black doesn't break before the operators for the following example even though the expression exceeds the configured line width

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa < bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb > ccccccccccccccccccccccccccccc == ddddddddddddddddddddd

But it would if the expression is used inside of a condition.

What I understand so far is that Black doesn't insert optional parentheses on the expression statement level (and a few other places) and, therefore, only breaks after opening parentheses. I propose to keep this deviation for now to avoid overlong-lines and use the compatibility report to make a decision if we should implement the same behavior.

@MichaReiser MichaReiser marked this pull request as draft July 8, 2023 09:23
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      8.4±0.04ms     4.8 MB/sec    1.00      8.4±0.02ms     4.8 MB/sec
formatter/numpy/ctypeslib.py               1.00  1861.6±10.12µs     8.9 MB/sec    1.01   1876.6±6.75µs     8.9 MB/sec
formatter/numpy/globals.py                 1.00    199.4±0.35µs    14.8 MB/sec    1.02    204.1±1.24µs    14.5 MB/sec
formatter/pydantic/types.py                1.02      4.2±0.07ms     6.1 MB/sec    1.00      4.1±0.00ms     6.2 MB/sec
linter/all-rules/large/dataset.py          1.00     14.1±0.03ms     2.9 MB/sec    1.00     14.1±0.06ms     2.9 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.5±0.01ms     4.7 MB/sec    1.00      3.5±0.00ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.00    365.9±1.30µs     8.1 MB/sec    1.00    367.1±1.43µs     8.0 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.3±0.05ms     4.1 MB/sec    1.00      6.3±0.06ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.00      7.2±0.01ms     5.6 MB/sec    1.00      7.2±0.15ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1484.3±4.68µs    11.2 MB/sec    1.00   1489.4±2.32µs    11.2 MB/sec
linter/default-rules/numpy/globals.py      1.00    161.1±0.73µs    18.3 MB/sec    1.00    161.4±1.51µs    18.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.2±0.01ms     7.9 MB/sec    1.00      3.2±0.01ms     8.0 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     11.7±0.44ms     3.5 MB/sec    1.00     11.7±0.37ms     3.5 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.6±0.12ms     6.4 MB/sec    1.05      2.7±0.13ms     6.1 MB/sec
formatter/numpy/globals.py                 1.00   286.1±14.85µs    10.3 MB/sec    1.04   298.0±18.83µs     9.9 MB/sec
formatter/pydantic/types.py                1.00      5.6±0.23ms     4.5 MB/sec    1.04      5.9±0.33ms     4.3 MB/sec
linter/all-rules/large/dataset.py          1.00     19.5±0.61ms     2.1 MB/sec    1.00     19.6±0.65ms     2.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      5.1±0.20ms     3.3 MB/sec    1.00      5.0±0.21ms     3.3 MB/sec
linter/all-rules/numpy/globals.py          1.01   629.4±21.41µs     4.7 MB/sec    1.00   622.7±80.29µs     4.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      8.7±0.33ms     2.9 MB/sec    1.00      8.7±0.34ms     2.9 MB/sec
linter/default-rules/large/dataset.py      1.02     10.1±0.33ms     4.0 MB/sec    1.00      9.9±0.37ms     4.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.1±0.07ms     7.9 MB/sec    1.00      2.1±0.08ms     7.9 MB/sec
linter/default-rules/numpy/globals.py      1.02   258.7±15.88µs    11.4 MB/sec    1.00   254.2±19.29µs    11.6 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.5±0.20ms     5.6 MB/sec    1.01      4.6±0.31ms     5.6 MB/sec

@MichaReiser MichaReiser force-pushed the expand-by-parentheses-first branch 2 times, most recently from 537c84f to 56bae34 Compare July 9, 2023 11:22
@MichaReiser MichaReiser changed the title Prefer parenthesized expressions before operands Prefer expanding parenthesized expressions before operands Jul 9, 2023
@MichaReiser MichaReiser mentioned this pull request Jul 10, 2023
@MichaReiser MichaReiser requested a review from konstin July 10, 2023 11:36
@MichaReiser MichaReiser added the formatter Related to the formatter label Jul 10, 2023
@MichaReiser MichaReiser marked this pull request as ready for review July 10, 2023 11:37
## Input

```py
if True:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression

Ruff tests if all lines fit. This is usless in the given case because strings have no further split point. Meaning, changing the layout to fully expanded doesn't really fix the does not fit problem, because we would need to split the srings, which we cannot. Splitting the last line doesn't really help either.
Should the FitsExpanded measure mode be FitsEnd rather than StartAndEnd? Meaning, we measure if everything up to the first separator fits, then everything coming after the last line break?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we do need to measure after the line break here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this for now to fix later. This is a general problem when using groups that contain no further split points.

let result = group(&format_args![
if_group_breaks(&text("(")),
indent_if_group_breaks(
&format_args![soft_line_break(), format_expr],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the line break inside the indent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent is a bit counter-intuitive. Luckily, this is hidden a way by soft_block_indent and block_indent. indent only increments the indentation level, it doesn't insert any line breaks itself. The soft_line_break is necessary to then start a new line, with the new, incremented indent.

This is why the closing newline is outside of the indent. We first need to decrement the indent counter, and only then start the new line.

## Input

```py
if True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we do need to measure after the line break here

Base automatically changed from conditional-group to main July 11, 2023 11:20
@MichaReiser MichaReiser merged commit 715250a into main Jul 11, 2023
16 checks passed
@MichaReiser MichaReiser deleted the expand-by-parentheses-first branch July 11, 2023 12:07
@MichaReiser
Copy link
Member Author

@MichaReiser merged this pull request with Graphite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formatter: Parenthesized expressions
2 participants