Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend formatter IR to support Black's expression formatting #5596

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 7, 2023

Summary

This PR makes two extensions to support Black's expression formatting.

ConditionalGroup

The first extension adds a new ConditionalGroup IR element. The semantic is the same as for Group, except that it has a condition controlling whether the enclosing content should be grouped or not. Another way to think about this is: Only group this content if some condition is true, but the condition only gets evaluated when printing the IR.

This is necessary to support Black's formatting for expressions where the enclosing parentheses are optional, for example Black only adds parentheses around the condition of an if statement if the expression, after breaking any lists, dicts, etc., does not fit on a line:

# We want 
if a + [ 
	b,
	c
]: 
	pass

# Rather than
if a
	+ [b, c]
: 
	pass

which is invalid syntax. Meaning we only want to break after left-parens ((, [, {), but not before operators.

However, we do want to break before operators when the whole expression does not fit:

# We want
if (
	a
	+ [b, c]
): pass

# rather than
if (
	a + [
		b, 
		c
	]
): pass

If the whole expression does not fit. You can see, how this is the opposite of when not adding the parentheses.

The way this extension solves this problem is to remove the binary expression groups (gate them with a condition) when the whole expression fits, but keep them when the whole expression expands.

Another way to think about the new IR is that this is the same as the following, but as custom IR to avoid interning content.

if_group_breaks(group(content), parentheses_group_id)),
if_group_fits_on_line(content, parentheses_group_id)

FitsExpanded

The way we implement the adding of the optional parentheses is by wrapping the whole if condition by a group. However, this creates a problem if an inner parenthesized expression expands, because this would automatically expand all enclosing groups, including the group that adds the optional parentheses. But we don't want the optional parentheses if a parenthesized expression expands.

The way this PR solves this problem is by introducing a new FitsExpanded IR that, similar to best fitting, acts as an expands boundary. Meaning, it won't expand the enclosing parentheses group if any of its content expands (a parenthesized expression). Other than best fitting, it also has a different fits definition. The content inside a FitsExpanded doesn't get measured in the "all flat" mode, instead, it gets measured assuming all its inner content will expand (what's the least space that is required, rather than what is the most space that it requires).

FitsExpanded also supports an optional Condition to control when this changed semantic applies or not. This is necessary because we want to keep the list expression together, if possible, when we break before an operator

# We want
if (
	a
	+ [b, c]
): pass

# Instead of 
if (
	a + [
		b, 
		c
	]
): pass

Test Plan

I added a few doctests and use the new IR in the next IR to format expressions.

Reference

This behavior mimics Blacks can_omit_invisible_parentheses and transformer selection.

Ruff does not test whether a parenthesized expression is at the start or end of line. This is done inside of the printer (expanding after an opening parentheses always has the consequence that the expression now is at the end of the line)

@MichaReiser MichaReiser changed the title Use different formatting logic depending on whether an expression is parenthesized Extend formatter IR to support Black's expression formatting Jul 8, 2023
@MichaReiser MichaReiser added the formatter Related to the formatter label Jul 8, 2023
@MichaReiser MichaReiser force-pushed the conditional-group branch 3 times, most recently from b104f22 to 7135143 Compare July 8, 2023 09:20
@MichaReiser MichaReiser marked this pull request as ready for review July 8, 2023 09:20
@MichaReiser MichaReiser requested a review from konstin July 8, 2023 09:23
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      8.0±0.02ms     5.1 MB/sec    1.00      8.0±0.02ms     5.1 MB/sec
formatter/numpy/ctypeslib.py               1.00   1844.7±2.43µs     9.0 MB/sec    1.00   1843.5±5.33µs     9.0 MB/sec
formatter/numpy/globals.py                 1.00    205.6±0.59µs    14.4 MB/sec    1.00    205.7±0.36µs    14.3 MB/sec
formatter/pydantic/types.py                1.02      4.0±0.01ms     6.4 MB/sec    1.00      3.9±0.01ms     6.5 MB/sec
linter/all-rules/large/dataset.py          1.00     13.6±0.09ms     3.0 MB/sec    1.00     13.6±0.06ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.4±0.01ms     4.9 MB/sec    1.00      3.4±0.01ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.00    435.6±1.12µs     6.8 MB/sec    1.00    435.5±2.80µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.1±0.03ms     4.2 MB/sec    1.00      6.0±0.02ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.01      6.7±0.02ms     6.0 MB/sec    1.00      6.7±0.02ms     6.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1470.6±8.16µs    11.3 MB/sec    1.00   1464.1±3.48µs    11.4 MB/sec
linter/default-rules/numpy/globals.py      1.00    166.4±0.26µs    17.7 MB/sec    1.00    166.9±0.51µs    17.7 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.01ms     8.4 MB/sec    1.00      3.0±0.01ms     8.4 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     11.7±0.33ms     3.5 MB/sec    1.01     11.9±0.30ms     3.4 MB/sec
formatter/numpy/ctypeslib.py               1.01      2.7±0.21ms     6.2 MB/sec    1.00      2.7±0.10ms     6.2 MB/sec
formatter/numpy/globals.py                 1.00   299.9±15.36µs     9.8 MB/sec    1.01   302.4±16.71µs     9.8 MB/sec
formatter/pydantic/types.py                1.01      5.8±0.21ms     4.4 MB/sec    1.00      5.7±0.23ms     4.5 MB/sec
linter/all-rules/large/dataset.py          1.00     20.0±0.38ms     2.0 MB/sec    1.00     19.9±0.37ms     2.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      5.2±0.14ms     3.2 MB/sec    1.00      5.1±0.16ms     3.2 MB/sec
linter/all-rules/numpy/globals.py          1.00   635.8±27.72µs     4.6 MB/sec    1.00   635.3±25.22µs     4.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      8.9±0.24ms     2.9 MB/sec    1.02      9.1±0.37ms     2.8 MB/sec
linter/default-rules/large/dataset.py      1.01     10.2±0.30ms     4.0 MB/sec    1.00     10.1±0.23ms     4.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.1±0.07ms     7.8 MB/sec    1.02      2.2±0.15ms     7.6 MB/sec
linter/default-rules/numpy/globals.py      1.00    250.1±9.72µs    11.8 MB/sec    1.01    253.2±9.66µs    11.7 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.5±0.16ms     5.6 MB/sec    1.00      4.5±0.15ms     5.6 MB/sec

crates/ruff_formatter/src/builders.rs Show resolved Hide resolved
crates/ruff_formatter/src/builders.rs Outdated Show resolved Hide resolved
crates/ruff_formatter/src/builders.rs Show resolved Hide resolved
crates/ruff_formatter/src/builders.rs Outdated Show resolved Hide resolved
@MichaReiser MichaReiser enabled auto-merge (squash) July 11, 2023 11:12
@MichaReiser MichaReiser merged commit d30e912 into main Jul 11, 2023
15 checks passed
@MichaReiser MichaReiser deleted the conditional-group branch July 11, 2023 11:20
MichaReiser added a commit that referenced this pull request Jul 11, 2023
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR implements Black's behavior where it first splits off parenthesized expressions before splitting before operands to avoid unnecessary parentheses:

```python
# We want 
if a + [ 
	b,
	c
]: 
	pass

# Rather than
if (
    a
    + [b, c]
): 
	pass
```

This is implemented by using the new IR elements introduced in #5596. 

* We give the group wrapping the optional parentheses an ID (`parentheses_id`)
* We use `conditional_group` for the lower priority groups  (all non-parenthesized expressions) with the condition that the `parentheses_id` group breaks (we want to split before operands only if the parentheses are necessary)
* We use `fits_expanded` to wrap all other parenthesized expressions (lists, dicts, sets), to prevent that expanding e.g. a list expands the `parentheses_id` group. We gate the `fits_expand` to only apply if the `parentheses_id` group fits (because we  prefer `a\n+[b, c]` over expanding `[b, c]` if the whole expression gets parenthesized).

We limit using `fits_expanded` and `conditional_group` only to expressions that themselves are not in parentheses (checking the conditions isn't free)

## Test Plan

It increases the Jaccard index for Django from 0.915 to 0.917

## Incompatibilites

There are two incompatibilities left that I'm aware of (there may be more, I didn't go through all snapshot differences). 

### Long string literals
I  commented on the regression. The issue is that a very long string (or any content without a split point) may not fit when only breaking the right side. The formatter than inserts the optional parentheses. But this is kind of useless because the overlong string will still not fit, because there are no new split points. 

I think we should ignore this incompatibility for now


### Expressions on statement level

I don't fully understand the logic behind this yet, but black doesn't break before the operators for the following example even though the expression exceeds the configured line width

```python
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa < bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb > ccccccccccccccccccccccccccccc == ddddddddddddddddddddd
```

But it would if the expression is used inside of a condition. 

What I understand so far is that Black doesn't insert optional parentheses on the expression statement level (and a few other places) and, therefore, only breaks after opening parentheses. I propose to keep this deviation for now to avoid overlong-lines and use the compatibility report to make a decision if we should implement the same behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants