Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit concatenated string formatting #9457

Closed
Tracked by #13371
MichaReiser opened this issue Jan 10, 2024 · 12 comments
Closed
Tracked by #13371

Implicit concatenated string formatting #9457

MichaReiser opened this issue Jan 10, 2024 · 12 comments
Labels
formatter Related to the formatter style How should formatted code look

Comments

@MichaReiser
Copy link
Member

MichaReiser commented Jan 10, 2024

This is a proposal for changing our implicit concatenated string formatting to solve #8272

The problem in the mentioned issue is that ISC001 is incompatible with the formatter because the formatter might format an implicit concatenated string on a single line, which triggers ISC001

# Input
a = (
	"bbbbbbb"
	"ccccccc"
)

## Formatted
a = "bbbbbbb" "ccccccc"

Which triggers ISC001.

Today

We either format all parts of an implicit concatenated string or a single line or format each part on its own line, if necessary:

# If it fits

a = "bbbbbbb" "ccccccc"

# If it exceeds the line width
a = (
    "aaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbb"
    "ccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
)

Proposal

I propose to implement a subset of Black's improved string processing by automatically joining implicit concatenated strings into a single string if the joined string fits on a single line:

# If it fits

a = "bbbbbbbccccccc"  # Note that it's now a single string. 

# If it exceeds the line width
a = (
    "aaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbb"
    "ccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
)

The motivation behind it is that, to my knowledge, there's no real reason for using an implicit concatenated string if it fits on a single line. The primary use case for implicit concatenated strings is to make a string fit into the desired line length by splitting it over multiple lines.

Considerations

The main downside is that joining the strings is non-reversible. Reversability is a desired property (Ruff's formatting tends to be reversible except for the trailing comma insertion) because it ensures that if you get the same formatting after making changes to the code and then undoing them again (without using revert, but by manually undoing your changes).

For example, let's say you shorten a sentence in a string that makes the implicit concatenated string fit. Ruff would join it. But Ruff wouldn't split it if you extended the sentence by the same number of characters you previously removed, likely resulting in a string that now exceeds the configured line width.

That's probably why Black's improved string processing automatically joins and splits strings.

I'm undecided about whether we should accept the non-reversibility. Still, I think it solves an actual problem and is less involved than implementing Black's entire improved string processing formatting. Meaning it's an opportunity to provide value today. In other words. It's a first step towards automatically splitting strings and we accept non-reversability until we get there.

@MichaReiser MichaReiser added formatter Related to the formatter style How should formatted code look labels Jan 10, 2024
@tylerlaprade
Copy link
Contributor

I am in favor of this proposal. I don't want implicit string concatenation ever, whether it's on a single line or multiple, so anything that helps avoid it is good by me.

@MichaReiser MichaReiser added this to the Formatter: Stable milestone Jan 11, 2024
@yakMM
Copy link

yakMM commented Jan 13, 2024

Thank you for this detailed proposal! I'm in In favor as well: I like the feature and I don't see why anyone would want this kind of string staying around in the code. This would satisfy my need reported in #8272.

I don't mind if this proposal becomes the default behavior or is behind a config option, but imo the argument of reversibility is not that much of a blocker, for two main reasons:

  • Version control (git, etc) diminish the need for reversibility, in the sense that big formatting changes are expected to modify the code quite a bit (especially when ran on a non-formatted code base).
  • As you pointed out Ruff's formatting tends to be reversible. I feel like it's already too late to have full reversibility (you mentioned the comma insertion example) and it add a strong constraint on the formatter behavior, possibly leading to a lot of work.

@WieeRd
Copy link

WieeRd commented Jan 18, 2024

I, too, would prefer making this as the default behavior of the Ruff formatter.
It's hard to imagine someone intentionally doing "foo" "bar", and if they were to, # fmt: skip can be used.

@MichaReiser
Copy link
Member Author

MichaReiser commented Feb 13, 2024

Two challenges need careful handling:

  • Concatenating a raw string with a regular string: r"a\b\\n" "bcd". It's unclear how this should be handled because converting the regular string to a raw string isn't an option if it uses any escape sequences. Converting a raw string to a regular string could be an option but sounds scary, I remember how confused I was last time when I tried to understand how raw strings work.
  • Concatenating a single and a triple quoted string: "abcd" """single line but triple quoted string"""

We don't need to handle multiline strings because they always expand on multiple lines.

@dhruvmanila
Copy link
Member

Regarding your first point, what would happen if there's a newline in the second string (r"a\b\\n" "b\ncd")? We don't want to convert the latter into a raw string.

Python is converting it to normal strings by escaping the content in the raw string:

In [1]: r"a\b\\n" "bc\nd"
Out[1]: 'a\\b\\\\nbc\nd'

In [2]: r"a\b\\n" r"bc\nd"
Out[2]: 'a\\b\\\\nbc\\nd'

It's more prominent when you print it out:

In [3]: print(r"a\b\\n" "bc\nd")
a\b\\nbc
d

@MichaReiser
Copy link
Member Author

yeah, I just realized that myself. There's also the issue that raw strings don't support escapes. So we might need to convert the other way round but I'm scarred doing this because I remember how confused I was last time when I tried to understand raw strings.

@tylerlaprade
Copy link
Contributor

I believe in you @MichaReiser 😃

@charliermarsh
Copy link
Member

Me too.

@MichaReiser
Copy link
Member Author

I remain sceptical ;)

@johnthagen
Copy link

I view this proposal as a pragmatic step forward for Ruff. ISC001 is a helpful lint that catches real mistakes. Right now our team has to disable ISC001 since we can't have a large warning printed every time we format our code.

@JacobHayes
Copy link

I'd expect r strings to be relatively rare overall, but might make up a disproportionate number of desired implicit concatenations. For the sake of 80:20 here on value:effort, perhaps:

  • ISC001 could be adjusted to tolerate implicit concatenation with r and non-r strings[1] (as those may actually be desired)
  • ruff format could join implicit concatenations, except those mixing r and non-r strings (as those are non-trivial to rewrite automatically)

Those changes should remove the ISC001 conflict with ruff format, supporting the majority of otherwise affected users.

Only (the few?) users with r and non-r strings that also want to avoid implicit string concatenation would be affected with unreported warnings. If it's important to cover that last subset of users (at least until implementing r concatenation), perhaps a new ISC (or other namespace) rule could be introduced that does not tolerate r and non-r implicit concatenation, which could instead be added to the conflict list.


1: ISC001 already does not support fixing concatenations with r and non-r strings, perhaps for the same reasons it's challenging here(?):

$ cat x.py
print("a" "b")
print("a" r"b")
print("a\n" r"b\n")
print(r"a\n" r"b\n")
$ ruff check --extend-select="ISC001" --isolated x.py
x.py:1:7: ISC001 [*] Implicitly concatenated string literals on one line
x.py:2:7: ISC001 Implicitly concatenated string literals on one line
x.py:3:7: ISC001 Implicitly concatenated string literals on one line
x.py:4:7: ISC001 [*] Implicitly concatenated string literals on one line
Found 4 errors.
[*] 2 fixable with the `--fix` option.

@MichaReiser
Copy link
Member Author

MichaReiser commented Oct 25, 2024

This is now available in preview mode, see #13663

Please give it a try and let me know if you run into any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter style How should formatted code look
Projects
None yet
Development

No branches or pull requests

8 participants