Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add rule to detect potential unintended string iteration #14007

Open
l1n opened this issue Oct 30, 2024 · 4 comments
Open

Feature request: Add rule to detect potential unintended string iteration #14007

l1n opened this issue Oct 30, 2024 · 4 comments
Labels
rule Implementing or modifying a lint rule type-inference Requires more advanced type inference.

Comments

@l1n
Copy link

l1n commented Oct 30, 2024

Problem Statement

Python strings being sequences can lead to unintended character-by-character iteration, which is a common source of bugs (https://mail.python.org/pipermail/python-3000/2006-April/000759.html is 2006!). Currently, there's no built-in rule (that I know of) to detect potential cases where a string might be accidentally iterated over instead of being treated as a single value or being split first.

Use Cases

# Common bug cases this would catch:

# Case 1: Direct string iteration
name = "Alice"
for char in name:  # Should warn - likely wanted to iterate over a list of names
    process_name(char)  # Gets 'A', 'l', 'i', etc. instead of "Alice"

# Case 2: String in collections
names = get_name()  # Returns str
for name in names:  # Should warn - iterating over characters
    print(name)

# Case 3: Type annotated strings
def process_items(items: str):  # Type hint indicates string
    for item in items:  # Should warn - likely wanted List[str]
        handle_item(item)

# Safe cases (no warning):
names = ["Alice", "Bob"]
for name in names:
    print(name)

text = "hello"
chars = list(text)  # Explicit conversion to character list
for char in chars:
    print(char)

Proposed Rule Details

  • Message: "Possible unintended string iteration. Consider using split(), list(), or ensuring iteration over string is intended."

The rule would detect:

  1. For loops iterating over string literals
  2. For loops iterating over variables annotated as str
  3. Iteration over strings in common higher-order functions (map, filter, etc.)
  4. Would respect explicit noqa comments and configurations
@MichaReiser
Copy link
Member

@AlexWaygood I would value your opinion on this. To me, using for to iterate over the characters of a string seems very legit.

@MichaReiser MichaReiser added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Oct 31, 2024
@AlexWaygood
Copy link
Member

Iteration over the characters of a string is a useful Python feature that's never going to go away. It's also one of the most common footguns in Python programming; there have been many requests to Python type checkers (let alone linters) to try to tackle this problem, and pytype actually implements a version of this. For examples, see:

There's definitely a strong user need for rules that can warn about this, even if they will necessarily come with false positives for the (rare) occasion when you really do need to iterate over the characters of a string. So I think we should definitely explore implementing this check... But it will probably have to wait until we migrate to using red-knot as the backend for Ruff (it needs strong type inference).

@AlexWaygood AlexWaygood added type-inference Requires more advanced type inference. accepted Ready for implementation and removed needs-decision Awaiting a decision from a maintainer accepted Ready for implementation labels Oct 31, 2024
@l1n
Copy link
Author

l1n commented Nov 3, 2024

I agree that there are legitimate cases for iterating over characters of a string and I find the syntax elegant, but I'd prefer to make the intent explicit (or turn off the rule for projects that use it often), so thank you for the update! Is #11653 the right issue to follow for the red-knot move?

@AlexWaygood
Copy link
Member

Is #11653 the right issue to follow for the red-knot move?

Unfortunately that's just one subtask we need to get to for an initial release of red-knot (and it will be only after the initial red-knot release that we'll start migrating Ruff rules to use red-knot as the backend). So it will probably be a while, I'm afraid :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule Implementing or modifying a lint rule type-inference Requires more advanced type inference.
Projects
None yet
Development

No branches or pull requests

3 participants