[`ruff`] Implement `unnecessary-regular-expression` (`RUF055`) #14659

ntBre · 2024-11-28T15:52:24Z

Summary

This is a limited implementation of the rule idea from #12283 to replace some uses of the re module with str method calls. A few of the examples given there:

re.sub("abc", "", s)  # => s.replace("abc", "")
re.match("abc", s)  # => s.startswith("abc")
re.search("abc", s)  # => "abc" in s
re.fullmatch("abc", s) # => "abc" == s
re.split("abc", s)  # => s.split("abc")

For this initial implementation, I've restricted the rule to string literals in the pattern argument to each of the re functions and further restricted these string literals to exclude any re metacharacters. Each of the re functions takes additional kwargs that change their behavior, so the rule doesn't apply when these are present either. re.sub can also take a function as the replacement argument (unlike str.replace, which expects another str), so the rule is also restricted to cases where that argument is also a string literal. Finally, match, search, and fullmatch return Match objects unlike the proposed fixes, so the rule only applies when these are used in a boolean test for their truth values. For example,

if re.match("abc", s):
    pass

would trigger the rule, but the plain re.match("abc", s) call above would not because the returned Match could be used. I think this is probably a fairly common use case, so the rule can still be useful even with these restrictions.

The limitations around Match seem necessary, but some of the other restrictions can probably be loosened. For example, the sub replacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plain str variables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.

I think these checks can also be directly extended to the regex package. I saw unraw-re-pattern (RUF039), for example, handles both re and regex, but I only handled re for now.

Test Plan

cargo test with new RUF055.py snapshot test.

Possible related rule

Right before submitting this, I tried running RUF055.py with python, and it crashed with a ValueError: cannot use LOCALE flag with a str pattern. That would be an easy thing to check with very similar code to what I have here.

Skylion007 · 2024-11-28T15:53:55Z

@dosisod This seems like it would be a good 'refurb' rule for your linter

github-actions · 2024-11-28T15:58:38Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+50 -0 violations, +0 -0 fixes in 3 projects; 52 projects unchanged)

apache/airflow (+6 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ helm_tests/airflow_aux/test_pod_template_file.py:358:16: RUF055 [*] Plain string pattern passed to `re` function
+ helm_tests/airflow_aux/test_pod_template_file.py:370:16: RUF055 [*] Plain string pattern passed to `re` function
+ helm_tests/airflow_aux/test_pod_template_file.py:407:16: RUF055 [*] Plain string pattern passed to `re` function
+ helm_tests/airflow_aux/test_pod_template_file.py:59:16: RUF055 [*] Plain string pattern passed to `re` function
+ helm_tests/airflow_aux/test_pod_template_file.py:97:16: RUF055 [*] Plain string pattern passed to `re` function
+ providers/tests/cncf/kubernetes/log_handlers/test_log_handlers.py:158:16: RUF055 [*] Plain string pattern passed to `re` function

zulip/zulip (+40 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ zerver/lib/test_classes.py:2208:20: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_delete_unclaimed_attachments.py:36:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_markdown_thumbnail.py:140:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_realm.py:2401:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_realm.py:2476:64: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_scheduled_messages.py:655:20: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_scheduled_messages.py:661:20: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:315:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:414:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:465:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:480:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:494:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:601:27: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:651:32: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:706:27: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_thumbnail.py:747:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:509:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:528:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:588:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:592:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:603:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:691:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:751:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:797:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:849:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:918:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:963:22: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload.py:988:26: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_local.py:115:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_local.py:39:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_local.py:52:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_local.py:62:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_local.py:96:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:126:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:140:23: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:173:29: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:180:29: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:62:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:79:19: RUF055 [*] Plain string pattern passed to `re` function
+ zerver/tests/test_upload_s3.py:91:19: RUF055 [*] Plain string pattern passed to `re` function

astropy/astropy (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ astropy/coordinates/tests/test_frames.py:364:11: RUF055 Plain string pattern passed to `re` function
+ astropy/io/fits/card.py:771:21: RUF055 [*] Plain string pattern passed to `re` function
+ astropy/io/registry/base.py:446:25: RUF055 [*] Plain string pattern passed to `re` function
+ astropy/time/tests/test_fast_parser.py:16:15: RUF055 [*] Plain string pattern passed to `re` function

Changes by rule (1 rules affected)

code	total	+ violation	- violation	+ fix	- fix
RUF055	50	50	0	0	0

MichaReiser

This overall looks great. You made this look simple.

The only thing that I notice we miss is raw-string support (or, at least, tests for it). Raw strings are the recommended way to write regex patterns in python because it avoids the need for double escaping.

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

crates/ruff_linter/resources/test/fixtures/ruff/RUF055.py

MichaReiser · 2024-11-28T16:20:03Z

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

+    // For now, reject any regex metacharacters. Compare to the complete list
+    // from https://docs.python.org/3/howto/regex.html#matching-characters
+    let has_metacharacters = string_lit.value.chars().any(|c| {
+        matches!(
+            c,
+            '.' | '^' | '$' | '*' | '+' | '?' | '{' | '}' | '[' | ']' | '\\' | '|' | '(' | ')'
+        )
+    });


I like how you intentionally excluded meta-characters. So consider this an extension, and I think it's totally fine to do this as a follow-up pr (or not at all).

It would be nice if the rule only skips replacement for characters that are different between regex expressions and regular strings. For example, \n matches \n in a regex and a string.

crates/ruff_linter/resources/test/fixtures/ruff/RUF055.py

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

AlexWaygood

Thanks for the excellent PR writeup -- it made reviewing this really easy! This looks great overall.

The limitations around Match seem necessary, but some of the other restrictions can probably be loosened. For example, the sub replacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plain str variables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.

We don't have an out-of-the-box way of doing this for strings right now, so I wouldn't try to tackle it in this PR. But if you're interested, a followup might be to add an is_str() function to ruff_python_semantic::analyze::typing that looks similar to this is_list function:

ruff/crates/ruff_python_semantic/src/analyze/typing.rs

Lines 739 to 745 in d9cbf2f

    
           /// Test whether the given binding can be considered a list. 
        
           /// 
        
           /// For this, we check what value might be associated with it through it's initialization and 
        
           /// what annotation it has (we consider `list` and `typing.List`) 
        
           pub fn is_list(binding: &Binding, semantic: &SemanticModel) -> bool { 
        
               check_type::<ListChecker>(binding, semantic) 
        
           }

And then you could use that in this rule for stronger type inference

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

Co-authored-by: Micha Reiser <micha@reiser.io>

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

ntBre · 2024-11-28T17:37:44Z

Thank you both for the great reviews! I think I've incorporated all of the suggestions, with the exception of handling simple escapes like \n. I want to search around a bit for how escapes are handled elsewhere in the code, but if nothing else, it shouldn't be that hard to allow a few common escapes like \n at least. I'm also happy to leave that as a follow-up though.

Similarly, I'm quite interested in the is_str idea, but I agree that that should be separate. I'll plan to look into that next.

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

AlexWaygood

This is great, thanks!

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

Co-authored-by: Simon Brugman <sbrugman@users.noreply.github.com>

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

zanieb · 2024-12-01T16:21:08Z

...ter/src/rules/ruff/snapshots/ruff_linter__rules__ruff__tests__preview__RUF055_RUF055.py.snap

+32 32 | 
+33 33 | # this should be replaced with "abc" == s
+34    |-if re.fullmatch("abc", s):
+   34 |+if "abc" == s:


As a minor note, I think this should be s == "abc". It's a minor stylistic difference (in which I think x == VALUE is more idiomatic), but it can cause actual differences due to type checking implementations, e.g. microsoft/pyright#9093

ntBre added 16 commits November 27, 2024 15:18

add unnecessary regular expression rule

d9d365d

add test case

f59c38d

renumber to 055

2538401

only applies to re for now

c2d1256

match on nargs too instead of guard

e05d91c

lint

1c871e3

generate schema

fa0651f

restrict the context for suggestions

6652fe8

update tests

d56c079

use in_boolean_test

9025495

update summary with more constraints

39391f4

generate Exprs instead of using format

916fd73

rename to avoid double negation

ef1da87

use locate_arg

fa16e9e

check that additional arguments prevent the rule

ce3fb1a

remove extra newline

0c10156

AlexWaygood added the great writeup A wonderful example of a quality contribution label Nov 28, 2024

MichaReiser reviewed Nov 28, 2024

View reviewed changes

AlexWaygood reviewed Nov 28, 2024

View reviewed changes

ntBre and others added 2 commits November 28, 2024 11:49

use str::contains instead of chars::any

bc04c1f

Co-authored-by: Micha Reiser <micha@reiser.io>

as_ref to &

57d8bc1

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

AlexWaygood added rule Implementing or modifying a lint rule preview Related to preview mode features labels Nov 28, 2024

ntBre added 5 commits November 28, 2024 11:53

retrieve pat from re_func now

c170945

inline locate_arg

265a4d0

fmt

e66bced

add brief summary and move details to the end

ec3346f

from_call_expr only needs SemanticModel, not Checker

046816a

ntBre added 4 commits November 28, 2024 12:00

move unnecessary_regular_expression to the top

6d8e9b1

check safety based on comments, add test case, add docs

421b30a

test all metacharacters and raw strings

27773b8

update snapshot, unsafe fix moved down in the file

a3f7cd4

dscorbett reviewed Nov 28, 2024

View reviewed changes

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs Outdated Show resolved Hide resolved

sbrugman reviewed Nov 28, 2024

View reviewed changes

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs Show resolved Hide resolved

AlexWaygood approved these changes Nov 28, 2024

View reviewed changes

sbrugman reviewed Nov 28, 2024

View reviewed changes

crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs Outdated Show resolved Hide resolved

ntBre and others added 8 commits November 28, 2024 13:07

use with_fix

31887d5

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

inline nargs

742a936

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

add reference

2639ac2

Co-authored-by: Simon Brugman <sbrugman@users.noreply.github.com>

use Fix::applicable_edit

1095362

Co-authored-by: Simon Brugman <sbrugman@users.noreply.github.com>

improve docs

70e1150

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

tidy applicable_edit

11a7f40

remove }, ], and ) from metacharacters

1ad19a8

extra s

cc5f75e

AlexWaygood merged commit 224fe75 into astral-sh:main Nov 28, 2024
21 checks passed

BrewTestBot mentioned this pull request Nov 29, 2024

ruff 0.8.1 Homebrew/homebrew-core#199449

Merged

ntBre mentioned this pull request Nov 29, 2024

[ruff] Extend unnecessary-regular-expression to non-literal strings (RUF055) #14679

Open

tdulcet mentioned this pull request Dec 1, 2024

Rule idea: Unnecessary use of re #12283

Closed

zanieb reviewed Dec 1, 2024

View reviewed changes

MichaReiser mentioned this pull request Dec 2, 2024

Revert: [pyflakes] Avoid false positives in @no_type_check contexts (F821, F722) (#14615) #14726

Merged

zanieb mentioned this pull request Dec 2, 2024

Update unncessary-regular-expression (RUF055) fix to use var == value #14733

Open

MichaReiser mentioned this pull request Dec 2, 2024

Extend RUF055 with more patterns #14738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`ruff`] Implement `unnecessary-regular-expression` (`RUF055`) #14659

[`ruff`] Implement `unnecessary-regular-expression` (`RUF055`) #14659

ntBre commented Nov 28, 2024

Skylion007 commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

MichaReiser left a comment

MichaReiser Nov 28, 2024

AlexWaygood left a comment •

edited

Loading

ntBre commented Nov 28, 2024

AlexWaygood left a comment

zanieb Dec 1, 2024

	/// Test whether the given binding can be considered a list.
	///
	/// For this, we check what value might be associated with it through it's initialization and
	/// what annotation it has (we consider `list` and `typing.List`)
	pub fn is_list(binding: &Binding, semantic: &SemanticModel) -> bool {
	check_type::<ListChecker>(binding, semantic)
	}

[ruff] Implement unnecessary-regular-expression (RUF055) #14659

[ruff] Implement unnecessary-regular-expression (RUF055) #14659

Conversation

ntBre commented Nov 28, 2024

Summary

Test Plan

Possible related rule

Skylion007 commented Nov 28, 2024 • edited Loading

github-actions bot commented Nov 28, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Nov 28, 2024

Choose a reason for hiding this comment

AlexWaygood left a comment • edited Loading

Choose a reason for hiding this comment

ntBre commented Nov 28, 2024

AlexWaygood left a comment

Choose a reason for hiding this comment

zanieb Dec 1, 2024

Choose a reason for hiding this comment

[`ruff`] Implement `unnecessary-regular-expression` (`RUF055`) #14659

[`ruff`] Implement `unnecessary-regular-expression` (`RUF055`) #14659

Skylion007 commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood left a comment •

edited

Loading