Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUF002 offers unrelated substitution for unicode quotation mark #3977

Open
syntaxaire opened this issue Apr 14, 2023 · 9 comments
Open

RUF002 offers unrelated substitution for unicode quotation mark #3977

syntaxaire opened this issue Apr 14, 2023 · 9 comments
Labels
fixes Related to suggested fixes for violations

Comments

@syntaxaire
Copy link

syntaxaire commented Apr 14, 2023

With ruff 0.0.261, using the "RUF" rule, the following docstring:

"""All the account’s items"""

produces the following error:

RUF002 [*] Docstring contains ambiguous unicode character `’` (did you mean ```?)

The Unicode character in the comment is U+2019 RIGHT SINGLE QUOTATION MARK which is drawn from upper right to lower left. The suggested substitution is the backtick character which is drawn from upper left to lower right - this character is not semantically related to the Unicode character.

This suggestion is automatically applied when --fix is specified, producing:

"""All the account`s items"""

Is this to avoid messing with docstring quotes by not substituting the single quote character '? In that case I would suggest not making this Unicode character autofixable at all, rather than substituting an unrelated character.

@charliermarsh
Copy link
Member

Hm, yeah, that's an interesting substitution. We actually didn't create the substitution table -- it's taken from the same source as VS Code, and is intended to match VS Code's own list of confusing characters + their replacements. It looks like VS Code offers the same suggestion:

Screen Shot 2023-04-14 at 3 24 50 PM

That's not to say that the table can't be improved, only to clarify that we didn't apply any of our own discretion in generating the list. We could consider changing that replacement to instead suggest ' though?

@charliermarsh charliermarsh added the question Asking for support or clarification label Apr 14, 2023
@syntaxaire
Copy link
Author

syntaxaire commented Apr 14, 2023

' has syntactic meaning, so it would not be able to be a naive fix.

'All the account’s items.'

works as a docstring, although a dodgy one... Substituting ' would make semantic sense but would cause a syntax error if autofixed.

@syntaxaire
Copy link
Author

syntaxaire commented Apr 16, 2023

I would make 2 suggestions for this issue:

  1. Leave the existing behaviour but remove autofix for this.

or,

  1. Substitute ' for , automatically adding an escape character if in a single quoted string.

For # 2, would it be possible to construct an anti-example where this suggestion could still change the meaning of code? For example, constructing a string and passing it to eval?

@sbdchd
Copy link
Contributor

sbdchd commented Apr 17, 2023

I think auto fix should probably be disabled for this sort of thing since the underlying ambiguous characters might be in a i18n related string or some documentation (even if it's not a doc comment, e.g., Django column help text) that is intentional

@charliermarsh charliermarsh added fixes Related to suggested fixes for violations and removed question Asking for support or clarification labels Jul 10, 2023
@flying-sheep
Copy link
Contributor

flying-sheep commented Jul 14, 2023

Does anyone here see ' and on their screen without being able to visually distinguish them?

If not: why is included in the table at all?

On a different note, it hurts my typographic sensibilities to see that specific substitution being made lol. I always see when people using grave accents instead of typographically correct quotes, and I always wince.

@bodograumann
Copy link
Contributor

What is being replaced originally comes from Unicode directly, with only some minor overrides from vscode.
I think the main intention of this feature is to prevent malicious code.

That a grave accent is recommended here is an upstream bug imho. My PR contains some background information as well: hediet/vscode-unicode-data#1

The discussion whether replacing with ' is a good idea, because it might break code in context of a single-quoted string should probably be had, but I am also with you @flying-sheep , that ' and are distinct enough to be allowed.
In the mean time you can allow it explicitely in your project:

[tool.ruff]
allowed-confusables = [
    "",  # Cf. https://github.com/astral-sh/ruff/issues/3977
]

@kureta
Copy link

kureta commented Jan 28, 2024

I have a similar problem. I get RUF002 Docstring contains ambiguous ı(LATIN SMALL LETTER DOTLESS I). Did you meani (LATIN SMALL LETTER I)? but "ı" and "i" are separate letters in the Turkish alphabet.

Couldn't be sure if I should open a new issue. Please let me know if so.

@charliermarsh
Copy link
Member

@kureta - I think that's following the intent of the rule, since the goal is to identify confusable Unicode characters in otherwise-ASCII words. I'd suggest adding it to the allowed-confusables list if it makes sense in your project! https://docs.astral.sh/ruff/settings/#allowed-confusables

@kureta
Copy link

kureta commented Jan 29, 2024

@charliermarsh thanks, i didn't know that option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants