Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve backslash in raw string literal #6152

Merged
merged 7 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions crates/ruff_python_formatter/src/expression/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,11 @@ impl Format<PyFormatContext<'_>> for FormatStringPart {

write!(f, [prefix, preferred_quotes])?;

let (normalized, contains_newlines) = normalize_string(raw_content, preferred_quotes);
let (normalized, contains_newlines) = normalize_string(
raw_content,
preferred_quotes,
matches!(prefix, StringPrefix::RAW | StringPrefix::RAW_UPPER),
harupy marked this conversation as resolved.
Show resolved Hide resolved
);

match normalized {
Cow::Borrowed(_) => {
Expand All @@ -223,7 +227,7 @@ impl Format<PyFormatContext<'_>> for FormatStringPart {
}

bitflags! {
#[derive(Copy, Clone, Debug)]
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub(super) struct StringPrefix: u8 {
const UNICODE = 0b0000_0001;
/// `r"test"`
Expand Down Expand Up @@ -434,7 +438,11 @@ impl Format<PyFormatContext<'_>> for StringQuotes {
/// with the provided `style`.
///
/// Returns the normalized string and whether it contains new lines.
fn normalize_string(input: &str, quotes: StringQuotes) -> (Cow<str>, ContainsNewlines) {
fn normalize_string(
input: &str,
quotes: StringQuotes,
is_raw: bool,
) -> (Cow<str>, ContainsNewlines) {
// The normalized string if `input` is not yet normalized.
// `output` must remain empty if `input` is already normalized.
let mut output = String::new();
Expand Down Expand Up @@ -468,7 +476,7 @@ fn normalize_string(input: &str, quotes: StringQuotes) -> (Cow<str>, ContainsNew
} else if c == '\n' {
newlines = ContainsNewlines::Yes;
} else if !quotes.triple {
if c == '\\' {
if !is_raw && c == '\\' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this check to line 490 instead because we need to make sure that quotes are properly escaped. Can you add the following test

r'It\'s normalizing \' and " quotes'

This should be formatted as:

r"It's normalizing ' and \" quotes"r

Copy link
Contributor Author

@harupy harupy Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaReiser r'It\'s normalizing \' and " quotes' is not equivalent to r"It's normalizing ' and \" quotes":

>>> r'It\'s normalizing \' and " quotes'
'It\\\'s normalizing \\\' and " quotes'
>>> r"It's normalizing ' and \" quotes"
'It\'s normalizing \' and \\" quotes'
>>> r'It\'s normalizing \' and " quotes' == r"It's normalizing ' and \" quotes"
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I now played with your PR and found an example that produces invalid syntax:

# Input
r'Not-so-tricky "quote \'\''

# Ruff
r"Not-so-tricky "quote \'\'"

# Black
r'Not-so-tricky "quote \'\''

Note how Ruff changes the quotes from ' to " but fails to escape the ".

We need to play a bit more with black to understand how black determines the preferred quotes for raw strings, and how the normalization has to work. Maybe @konstin knows more, because I'm not that familiar with Python and I must say, the escaping logic behind raw strings is confusing to me (you have to escape quotes)

Copy link
Contributor Author

@harupy harupy Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation. I'm reading black's source code. It looks like black returns the original string if it contains unescaped opposite quotes:

https://github.com/psf/black/blob/1a972e3e11b144912155babdf48ff23d68059d57/src/black/strings.py#L201

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i find python's raw string escaping rules confusing and i think there are cases that are just not properly representable (as the case above where black returns)

if let Some(next) = input.as_bytes().get(index + 1).copied().map(char::from) {
#[allow(clippy::if_same_then_else)]
if next == opposite_quote {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,13 +82,12 @@ f"\"{a}\"{'hello' * b}\"{c}\""
+f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
+f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
r"raw string ftw"
-r"Date d\'expiration:(.*)"
+r"Date d'expiration:(.*)"
r"Date d\'expiration:(.*)"
r'Tricky "quote'
-r"Not-so-tricky \"quote"
-rf"{yay}"
-"\nThe \"quick\"\nbrown fox\njumps over\nthe 'lazy' dog.\n"
+r'Not-so-tricky "quote'
+r'Not-so-tricky \"quote'
MichaReiser marked this conversation as resolved.
Show resolved Hide resolved
+f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
+"\n\
+The \"quick\"\n\
Expand Down Expand Up @@ -147,9 +146,9 @@ f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
r"raw string ftw"
r"Date d'expiration:(.*)"
r"Date d\'expiration:(.*)"
r'Tricky "quote'
r'Not-so-tricky "quote'
r'Not-so-tricky \"quote'
f"NOT_YET_IMPLEMENTED_ExprJoinedStr"
"\n\
The \"quick\"\n\
Expand Down
Loading