Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow \ to be both a regex escape character and a windows directory delimiter in ignore-paths option #5398

Open
zeloras opened this issue Nov 25, 2021 · 14 comments
Labels
Bug 🪲 Configuration Related to configuration Needs PR This issue is accepted, sufficiently specified and now needs an implementation Regression

Comments

@zeloras
Copy link

zeloras commented Nov 25, 2021

Bug description

Good day/morning/evening!

Today i've update pylint to version 2.12.1 and for now that version ignoring ignore-paths=./migrations/..py,
in my .pylintrc config. But in version 2.11.1 everything is working correct

Configuration

[MASTER]
ignore-paths=.*/migrations/.*\.py,.*/urls.py
disable=R0903,C0103,E0110,W0511,R0801,E1101
max-line-length=120
max-module-lines=800

[DESIGN]
max-attributes=30
max-parents=30
max-public-methods=50

Command used

pylint --rcfile=.pylintrc apps

Pylint output

************* Module apps.accounts.migrations.0001_initial
apps/accounts/migrations/0001_initial.py:27:0: C0301: Line too long (196/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:28:0: C0301: Line too long (329/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:29:0: C0301: Line too long (165/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:30:0: C0301: Line too long (203/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:38:0: C0301: Line too long (255/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:45:0: C0301: Line too long (266/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:46:0: C0301: Line too long (229/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0001_initial.py:12:0: C0115: Missing class docstring (missing-class-docstring)
apps/accounts/migrations/0001_initial.py:9:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.accounts.migrations.0002_alter_pacmanuser_last_login
apps/accounts/migrations/0002_alter_pacmanuser_last_login.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0002_alter_pacmanuser_last_login.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0003_alter_pacmanuser_username
apps/accounts/migrations/0003_alter_pacmanuser_username.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0003_alter_pacmanuser_username.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0004_auto_20211124_1539
apps/accounts/migrations/0004_auto_20211124_1539.py:20:0: C0301: Line too long (246/120) (line-too-long)
apps/accounts/migrations/0004_auto_20211124_1539.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0004_auto_20211124_1539.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0005_changeuseremailconfirmationcode
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:9:0: C0115: Missing class docstring (missing-class-docstring)
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:6:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.notifications.migrations.0001_initial
apps/notifications/migrations/0001_initial.py:24:0: C0301: Line too long (385/120) (line-too-long)
apps/notifications/migrations/0001_initial.py:33:0: C0301: Line too long (130/120) (line-too-long)
apps/notifications/migrations/0001_initial.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0001_initial.py:10:0: C0115: Missing class docstring (missing-class-docstring)
apps/notifications/migrations/0001_initial.py:7:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.notifications.migrations.0002_alter_notifications_notification_type
apps/notifications/migrations/0002_alter_notifications_notification_type.py:16:0: C0301: Line too long (518/120) (line-too-long)
apps/notifications/migrations/0002_alter_notifications_notification_type.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0002_alter_notifications_notification_type.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.notifications.migrations.0003_alter_notifications_notification_type
apps/notifications/migrations/0003_alter_notifications_notification_type.py:16:0: C0301: Line too long (648/120) (line-too-long)
apps/notifications/migrations/0003_alter_notifications_notification_type.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0003_alter_notifications_notification_type.py:6:0: C0115: Missing class docstring (missing-class-docstring)

Expected behavior


Your code has been rated at 10.00/10

Pylint version

pylint-2.12.1

OS / Environment

Linux d507ea5a5499 5.14.17-301.fc35.x86_64 #1 SMP Mon Nov 8 13:57:43 UTC 2021 x86_64 GNU/Linux

Additional dependencies

No response

@zeloras zeloras added Bug 🪲 Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Nov 25, 2021
@Pierre-Sassoulas Pierre-Sassoulas added Regression and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Nov 25, 2021
@DanielNoord
Copy link
Collaborator

@Pierre-Sassoulas This is actually a tricky bug to resolve. #5201 normalises \ to become OS-agnostic. However, \ is also the character used to escape characters in regex.
So, .*/migrations/.*.py,.*/urls.py should work but I think users rightly expect \ to be the escape character. Do you know any regex trickery to allow both swapping \ to / for Windows but also allowing escaping characters?

@Pierre-Sassoulas
Copy link
Member

What if we "normalize" the regex only if the current OS is actually windows ? I don't like making OS specific code but this feel justified here.

@DanielNoord
Copy link
Collaborator

Hmm, then users on windows can't use the \ to escape. They might copy the pylintrc from somebody who wrote it on Linux and then get a different result from the same pylintrc I don't think that works.

Could we add a line to the description of the setting and say that escaping is not allowed since \ is considered a directory break?

@Pierre-Sassoulas
Copy link
Member

Could we use glob instead of regex ? I'm not sure it's really equivalent. glob could handle the hard part and we could point users to the glob documentation.

@DanielNoord
Copy link
Collaborator

The problem is actually in the validator:
https://github.com/PyCQA/pylint/blob/e33bc6377bbbf6135888cdadecdfc3baa8497a24/pylint/config/option.py#L30-L40

pathlib.PureWindowsPath('.*\\ignore\\.*').as_posix()
'.*/ignore/.*'
pathlib.PureWindowsPath('.*\ignore\.*').as_posix()
'.*/ignore/.*'

For some reason there is no difference here. Even \\\\ returns the same. I don't think changing to glob is the right move as then we would invalidate all current configs using ignore-paths. That seems like a large breaking change.
What do you think of updating the description and saying that the escape character can't be used as it represent the dir-delimiter on Windows? It is also not really a nice solution, but I don't know a better way forward..

@Pierre-Sassoulas
Copy link
Member

I think we should revert the windows specific part until we find a non breaking change way of handling this properly. Most of our user are on Linux according to pypi stats

pylint_os_proportion

@DanielNoord
Copy link
Collaborator

Wouldn't it be better to disallow the use of the escape character? How often are you actually using the escape character in a regex for paths anyway? In the regex used in the OP the escape character is unnecessary as .*/migrations/.*.py,.*/urls.py would work as well. The only thing I can think of is using $ in a directory or file name, but I don't think that is that uncommon. Personally I think supporting windows serves a large user base than the user base that actually needs the escape character.

@Pierre-Sassoulas
Copy link
Member

supporting windows serves a large user base than the user base that actually needs the escape character

You're right, and on top of that we can now raise a proper configuration parsing error if the regex contain a forbidden character so it's going to be relatively easy to warn users. The real fix still eludes me though. At least if we don't want to do any breaking changes.

@DanielNoord
Copy link
Collaborator

You're right, and on top of that we can now raise a proper configuration parsing error if the regex contain a forbidden character so it's going to be relatively easy to warn users.

Not sure that is so easy. Warning on \ doesn't make sense, so we could only warn on $ and ^ I guess? Perhaps we should make the description of the setting very clear and hope people don't run into this too much 😅

The real fix still eludes me though. At least if we don't want to do any breaking changes.

I don't have a good solution as well sadly..

@zeloras With regards to your initial question: we have reached the conclusion that we are going to disallow \ as escape character for the ignore-paths setting as the \ is needed as delimiter in standard windows paths. I'm going to update the description of the setting accordingly. I know this might not fully satisfy your needs, but we need to make a compromise here (you can read the preceding comments to see what the problem is for us).

In your case updating the settings file to include ignore-paths=.*/migrations/.*.py,.*/urls.py should make the setting work like you expect it to.

@DanielNoord
Copy link
Collaborator

For future reference:

The setting description has been updated with #5415. This issue is kept to think of a better and permanent solution to allow both \ as escaping character and normalise paths to POSIX and Windows systems as introduced in #5201.

@DanielNoord DanielNoord changed the title Seems like 2.12.1 ignoring "ignore-paths" in config Allow \ to be both a regex escape character and a windows directory delimiter in ignore-paths option Nov 29, 2021
@DanielNoord DanielNoord added the Configuration Related to configuration label Nov 29, 2021
@Pierre-Sassoulas Pierre-Sassoulas added the Needs decision 🔒 Needs a decision before implemention or rejection label Jul 1, 2022
@DanielNoord DanielNoord added Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Needs decision 🔒 Needs a decision before implemention or rejection labels Jul 2, 2022
@DetachHead
Copy link
Contributor

a workaround is to use pyproject.toml instead of .pylintrc, and use raw strings (' instead of "):

[tool.pylint.MASTER]
ignore-patterns = '.*\.pyi'

@Zeckie
Copy link
Contributor

Zeckie commented Jan 10, 2023

Could we add a line to the description of the setting and say that escaping is not allowed since \ is considered a directory break?

I think any option that modifies or limits regex patterns (like replacing all \s) is a bad idea, as that will then confuse or frustrate people who are familiar with regular expressions, and used to being able to use all regex features. \ is useful in this case both for escaping chars that are allowed in filenames bug have special meaning in regex (like ., (, +, and [), but also for character classes (like \d (digit), \w (alphanumeric plus underscore), \S any char except whitespace) and unicode chars (like \N{EM DASH}). There are more features using \, and plenty of examples in https://docs.python.org/3/library/re.html

From just looking at this issue, before reading the documentation, I though that --ignore-paths was intended to be paths (not regex patterns), and --ignore-patterns was the same but using patterns. Even looking at the documentation, it is difficult to find anything about setting the same configuration using command line vs pyproject.toml and .pylintrc. I would have expected that to be under configuration, similar to how it is in the mypy documentation.

I also think these names would be much clearer:
--ignore-paths = paths only, not regex patterns
--ignore-path-regex = regex patterns matching paths
--ignore-filenames = file names only, not paths or patterns
--ignore-filename-regex = regex patterns matching filenames

When the string is not a regex pattern, it should be fine to normalize \ to /, but it is not ok to do that for regular expressions.

To work with windows paths, --ignore-path-regex should use the regex exactly as supplied, and test the same regex on both the windows style and the posix style paths. So the file C:\Projects\foo\bar\baz-1.py would be tested against the regex as python strings "C:\\Projects\\foo\\bar\\baz-1.py" (would match pattern like \\[\w-]+\.py) and "C:/Projects/foo/bar/baz-1.py" (would match pattern like /[\w-]+\.py).

Pattern: foo\d{4}-bar.py would match foo0000-bar.py and foo1234-barxpy but not foo\dddd-bar.py. To best match something like that, use foo\\d{4}-bar\.py

It would also be useful to have more examples in the documentation, to avoid ambiguities such as whether basename includes the extension (https://docs.python.org/3/library/os.path.html#os.path.basename) or not (https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FilenameUtils.html#getBaseName-java.lang.String- ), and whether , is part of the filename or separating 2 filenames.

In the regex used in the OP the escape character is unnecessary as .*/migrations/.*.py,.*/urls.py would work as well.

But that would also match apps/notifications/migrations/foopy (even though it contains no .)

@DanielNoord
Copy link
Collaborator

If we are going to rename configuration options we might as well also consider using glob instead of regex..

The code that tests this is fairly straightforward and I'm very open to reviewing a PR that "fixes" this or at least improves the current situation. I won't have much time to work on this myself sadly.

@jiasli
Copy link

jiasli commented May 23, 2023

I agree with @Zeckie that limiting regex's escaping functionality is not a good idea.

In your case updating the settings file to include ignore-paths=.*/migrations/.*.py,.*/urls.py should make the setting work like you expect it to.

.*/urls.py is not the same as .*/urls\.py - .*/urls.py matches foo/urls-py too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🪲 Configuration Related to configuration Needs PR This issue is accepted, sufficiently specified and now needs an implementation Regression
Projects
None yet
Development

No branches or pull requests

6 participants