Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String formatting linting #443

Merged
merged 3 commits into from
Jul 30, 2019
Merged

String formatting linting #443

merged 3 commits into from
Jul 30, 2019

Conversation

asottile
Copy link
Member

@asottile asottile commented Apr 4, 2019

There's three classes of lint messages added here -- I'm happy to split out / reorder / etc. these commits as fit. Also please suggest better names / messages -- these were my first-pass attempt at the messages 😆

Resolves #148

Supersedes #370 (CC @fperrin)

f-strings

  • f-strings without placeholders

'...'.format(...)

  • '{}'.format(1, 2) -- extra positional arguments
  • '{foo}'.format(foo=1, bar=2) -- extra named arguments
  • '{} {}'.format(1) -- placeholder missing argument (positional)
  • '{foo} {bar}'.format(foo=1) -- placeholder missing argument (named)
  • '{} {1}'.format(1, 2) -- switching between manual / automatic numbering
  • '{0} {}'.format(1, 2) -- switching between manual / automatic numbering
  • '{'.format(1) -- invalid format string

'...' % ...

  • '%(foo)' % () -- invalid format string (missing the conversion type)
  • '%j' % () -- invalid format specifier (j is not a valid conversion type)
  • '%s %s' % (1, 2, 3) -- positional count mismatch (too many args)
  • '%s %s' % (1,) -- positional count mismatch (too few args)
  • '%(bar)s' % {'bar': 1, 'baz': 2} -- extra named argument supplied
  • '%(bar)s %(baz)s' % {'bar': 1} -- missing named argument
  • '%(bar)s' % (1, 2, 3) -- expected mapping, but got sequence
  • '%s %s' % {'baz': 1} -- expected sequence, got mapping

@asmeurer
Copy link
Contributor

asmeurer commented Apr 4, 2019

Cool that it doesn't actually evaluate the string so it isn't susceptible to #370 (comment). Maybe that should be added as a test.

@asottile
Copy link
Member Author

asottile commented Apr 4, 2019

Oops this is still WIP -- found two more cases to handle:

  • # currently flagged as "too many placeholders"
    '%.*f %s' % (2, b / float(factor), suffix)
  • # currently flagged as "unused placeholder at position 2"
    '{:{}} {}'.format(1, 15, 2)

@asmeurer
Copy link
Contributor

asmeurer commented Apr 4, 2019

Is it possible to have real column numbers here? For example

"abc %t" % 1

reports "unsupported format character 't'" line 1, column 1. It would be nice to have column 7 (where the t is).

@asmeurer
Copy link
Contributor

asmeurer commented Apr 4, 2019

Does this have any support for finding multiple errors in the same string? It seems like it could be another advantage of this approach. I tried "%s%t" % 1 but it only gave one error.

@asottile
Copy link
Member Author

asottile commented Apr 4, 2019

Is it possible to have real column numbers here? For example

"abc %t" % 1

reports "unsupported format character 't'" line 1, column 1. It would be nice to have column 7 (where the t is).

it's difficult, but doable -- I'd like to leave that to a v2 as a later improvement because it is not easy and purely cosmetic. pyflakes currently only has support for node-based reporting and it would be a large refactor to point at positions within a node:

self.lineno = loc.lineno
self.col = getattr(loc, 'col_offset', 0)

Does this have any support for finding multiple errors in the same string? It seems like it could be another advantage of this approach. I tried "%s%t" % 1 but it only gave one error.

It does, however there's only 1 statically determinable error in that format string you've posted (without attempting to solve an intractably large problem of full type checking) -- notably the positional / keyword can only really be determined if it's a tuple / list / dict literal (and for the case of dictionaries it can only determine it if all the keys are strings)

Here's an example where multiple are detected:

$ python3 -m pyflakes t.py
t.py:1:1 '...' % ... has unused named argument(s): bar
t.py:1:1 '...' % ... is missing argument(s) for placeholder(s): foo
$ cat t.py
'%(foo)s' % {'bar': 'baz'}

@asmeurer
Copy link
Contributor

asmeurer commented Apr 4, 2019

Ah. I had assumed it would take literals as obviously not unpackable. But I can see that in general pyflakes would need a recursive type inference system, which it currently does not have. I guess this just shows why format is better than %.

@asottile
Copy link
Member Author

asottile commented Apr 4, 2019

It ~could probably inference int / float / str / bool / None literals without too much more complexity -- though I left them out since they're not interesting formatting

@asmeurer
Copy link
Contributor

asmeurer commented Apr 4, 2019

That's a good point.

@asottile
Copy link
Member Author

asottile commented Apr 4, 2019

here's a few more cases it found: python/mypy#6631

@asottile
Copy link
Member Author

asottile commented Apr 4, 2019

Found one more unhandled case that I'll get a fix for:

  •  '{foo}-{}'.format(1, foo=2)

JukkaL pushed a commit to python/mypy that referenced this pull request Apr 5, 2019
@asottile
Copy link
Member Author

asottile commented May 2, 2019

I've tried this out on quite a few internal codebases and it appears to be stable and has proved quite useful (fixed a bunch of bugs in rare error cases!) -- would definitely appreciate some reviews on the approach with the hope that this gets integrated :)

pyflakes/messages.py Outdated Show resolved Hide resolved
@asmeurer
Copy link
Contributor

asmeurer commented May 2, 2019

I ran this on a bunch of codebases. I didn't really find many errors. Only a handful of issues that wouldn't lead to runtime errors (extra unused formatting arguments). But there were no crashes or incorrect reports, so that's good. I have no idea if it missed anything.

Some of the error messages can be a bit confusing. The arguments to the format function call are counted starting at 0. I can see why this is done, as that's how formatting works, but for something like f(x, y, z) I'm used to calling z the 3rd argument, not position 2. Maybe instead of listing the position it could name the argument itself.

It can also be a bit confusing with something like '{} {}'.format(1) that the {} is called "placeholder 1", same as '{0} {1}'.format(1). I can't think of a cleaner way to write this error message, however. I'm also not clear from the code if it would be straightforward distinguish '{} {}' and '{0} {1}' (I haven't read the code especially closely, though).

Another suggestion would be to replace the (s) in the error messages with an automatic pluralizer to improve readability. Alternately emit one message per error. So for instance

'{a} {b} {c}'.format(a=1)

would produce

'...'.format(...) is missing an argument for placeholder 'b'
'...'.format(...) is missing an argument for placeholder 'c'

instead of

'...'.format(...) is missing argument(s) for placeholder(s): b, c

I think I would prefer splitting like this, as this matches the way pyflakes handles other errors, and it will allow better reporting once column numbers are added (the 'b' and 'c' errors could each point to their respective position in the format string).

In all I think the changes here are pretty good. But again I didn't read the code too closely, so you might want someone else to do that.

Copy link
Member

@myint myint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I added a minor change request.

pyflakes/test/test_other.py Show resolved Hide resolved
@myint myint merged commit eeb6263 into PyCQA:master Jul 30, 2019
@asmeurer
Copy link
Contributor

Should we open a new issue for splitting out the errors into separate errors for each item and adding column information?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect broken string templates that will raise ValueError when interpolated
4 participants