Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic formatting for ExprDict #5167

Merged
merged 7 commits into from
Jun 20, 2023

Conversation

davidszotten
Copy link
Contributor

Summary

basic formatting for ExprDict

Test Plan

snapshots

@github-actions
Copy link
Contributor

github-actions bot commented Jun 17, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      6.8±0.01ms     5.9 MB/sec    1.00      6.8±0.01ms     6.0 MB/sec
formatter/numpy/ctypeslib.py               1.01  1371.5±10.82µs    12.1 MB/sec    1.00   1357.7±4.97µs    12.3 MB/sec
formatter/numpy/globals.py                 1.01    132.6±0.28µs    22.3 MB/sec    1.00    131.8±0.17µs    22.4 MB/sec
formatter/pydantic/types.py                1.01      2.8±0.01ms     9.2 MB/sec    1.00      2.7±0.01ms     9.3 MB/sec
linter/all-rules/large/dataset.py          1.01     13.7±0.03ms     3.0 MB/sec    1.00     13.6±0.03ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.5±0.00ms     4.8 MB/sec    1.00      3.4±0.00ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.01    363.3±9.01µs     8.1 MB/sec    1.00    359.6±0.99µs     8.2 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.0±0.01ms     4.2 MB/sec    1.00      6.0±0.01ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.00      7.0±0.01ms     5.8 MB/sec    1.01      7.0±0.01ms     5.8 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1466.7±3.35µs    11.4 MB/sec    1.00   1469.6±3.14µs    11.3 MB/sec
linter/default-rules/numpy/globals.py      1.00    157.4±0.20µs    18.7 MB/sec    1.00    157.8±0.19µs    18.7 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.2±0.00ms     8.0 MB/sec    1.00      3.2±0.01ms     8.0 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      9.9±0.43ms     4.1 MB/sec    1.00      9.9±0.43ms     4.1 MB/sec
formatter/numpy/ctypeslib.py               1.00  1983.3±80.85µs     8.4 MB/sec    1.02      2.0±0.12ms     8.2 MB/sec
formatter/numpy/globals.py                 1.00   205.7±13.57µs    14.3 MB/sec    1.04   214.4±18.15µs    13.8 MB/sec
formatter/pydantic/types.py                1.03      4.1±0.20ms     6.2 MB/sec    1.00      4.0±0.15ms     6.4 MB/sec
linter/all-rules/large/dataset.py          1.00     20.3±0.81ms     2.0 MB/sec    1.00     20.2±0.79ms     2.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      5.3±0.25ms     3.2 MB/sec    1.00      5.2±0.21ms     3.2 MB/sec
linter/all-rules/numpy/globals.py          1.00   642.4±27.91µs     4.6 MB/sec    1.00   640.4±34.31µs     4.6 MB/sec
linter/all-rules/pydantic/types.py         1.08      9.3±0.49ms     2.7 MB/sec    1.00      8.7±0.29ms     2.9 MB/sec
linter/default-rules/large/dataset.py      1.00     10.3±0.32ms     4.0 MB/sec    1.00     10.2±0.54ms     4.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.04      2.2±0.11ms     7.4 MB/sec    1.00      2.2±0.12ms     7.7 MB/sec
linter/default-rules/numpy/globals.py      1.00   267.5±17.57µs    11.0 MB/sec    1.00   267.8±20.70µs    11.0 MB/sec
linter/default-rules/pydantic/types.py     1.02      4.8±0.22ms     5.4 MB/sec    1.00      4.7±0.21ms     5.5 MB/sec

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks really good for such a complex node!

@@ -0,0 +1,26 @@
# before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good test case

@konstin konstin added the formatter Related to the formatter label Jun 18, 2023
@davidszotten davidszotten force-pushed the format-expr-dict branch 2 times, most recently from 99f3ed6 to afd88a6 Compare June 19, 2023 08:21
Some(preceding) => preceding.end(),
None => comment.enclosing_node().start(),
};
let mut tokens_before = SimpleTokenizer::new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_non_trivia_token_rev might be easier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward version is prefered over rev (when possible) because rev always needs to find the start of the line to assert that the token isn't part of a comment.

# the slash is not a continuation token / 
# The following slash is a continuation token
a + /

b # trailing
}

{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a different issue i'm not sure how to approach:

{
    ** # between
    b,
}


{
    # before
    ** # between
    b,
}

formats as

{
    **b,  # between
}


{
    **# before
    b,  # between
}

the # before is causes the stars to be (or stay?) split off again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try manually calling leading_comments to format bs leading comments before the **?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to work. though i'm slightly confused about how that works. does leading_comments() (in addition to outputting the comment) also remove it from where it would otherwise automatically be added?

is there somewhere i can read about how the formatting system works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the case. Formatting comments marks them as formatted and they are then filtered out by the next leading_node_comments call.

There's no explicit documentation for it, but you can read the source. Formatting comments is also based on implementing Format, the same as implementing formatting for any node

@davidszotten davidszotten force-pushed the format-expr-dict branch 2 times, most recently from 7be8e80 to 051b6c0 Compare June 19, 2023 19:28
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Some(preceding) => preceding.end(),
None => comment.enclosing_node().start(),
};
let range_start = preceding_end + TextSize::new(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using +1 may not be sufficient to skip the comma in case there's whitespace between the preceding node and the comma (or even a comment). I recommend to either call next on the iterator before the loop (with a debug assertion that it matches a comma or {), or matching on the kind in the loop and skipping over commas and {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. found an example like that which breaks this code. can fix by skipping first token instead of a single char 👍

how come you suggest asserting the skipped token kind? (it can also be a colon which makes the list a bit tedious to check)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main motivation to add asserts if the list of possible kinds is limited (only a few kinds) is to catch incorrect ranges or wrong assumptions early. For example, I messed up the range in #5176. We wouldn't have know about this without the debug assertion being present. The debug assertions further act as documentation. They communicate to readers what kind of tokens are expected.

You can use matches!(token, Some(Token { kind: TokenKind::Colon | TokenKind::Comma | ... })) which, hopefully, makes it less awkward.

@MichaReiser MichaReiser linked an issue Jun 20, 2023 that may be closed by this pull request
group key/value pairs when formatting dict to prefer breaking lines
between entries instead of inside them
handle comments between the `**` and the variable name when unpacking
dicts inside dict literals

can possibly be extended to function arguments, see inline comment
@MichaReiser
Copy link
Member

This is awesome. Thank you so much.

@MichaReiser MichaReiser enabled auto-merge (squash) June 20, 2023 09:18
@MichaReiser MichaReiser merged commit 773e79b into astral-sh:main Jun 20, 2023
@davidszotten
Copy link
Contributor Author

Thank you for the mentoring and for your patience with all my questions!

@davidszotten davidszotten deleted the format-expr-dict branch July 7, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formatter: Dict
3 participants