Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect broken string interpolation and formatting #370

Closed
wants to merge 3 commits into from
Closed

Detect broken string interpolation and formatting #370

wants to merge 3 commits into from

Conversation

fperrin
Copy link

@fperrin fperrin commented Oct 5, 2018

Verify simple cases, such as: "%s: %s" % (foo, bar, baz) and "{foo}: {bar}".format(foo="foo", baaar="baz").

This is done by asking Python to do the formatting, filling in the arguments with dummy objects.

The testing for .format string is skipped for Python3 but should be rather easy to add; I want to get an idea of whether you like the current approach first.

Resolves #148

@asmeurer
Copy link
Contributor

asmeurer commented Oct 5, 2018

This is awesome.

The only potential issue I can see with this is that someone could DoS pyflakes by creating a very large string using float interpolation. For instance, "%1000000000000f" % 1 will create a string that's 1 terabyte in size. Hopefully this isn't difficult to workaround.

@bitglue
Copy link
Member

bitglue commented Oct 5, 2018

Maybe that's a feature, since it's equally a problem with the program under analysis.

What's the impact to performance?

@asmeurer
Copy link
Contributor

asmeurer commented Oct 5, 2018

Well I would expect rogue code to not be able to crash a static analysis tool. I can also write os.system('rm -rf /') which is a problem for the code if it runs, but if pyflakes actually ran the code it would be a serious issue for pyflakes. Practically speaking, it means someone can construct a very small Python source that will eat up all the memory and crash any program that uses pyflakes (text editors, REPLs, etc.).

Also, it's a mistake to believe that pyflakes code <=> code will be run. pyflakes is often used as an editor extension meaning it runs on intermediate versions of code, not just the final version that will be executed. All it takes is for someone to accidentally hold down a the number key for too long when constructing a formatting string to accidentally create something like what I have above, and it will crash pyflakes (and possibly the editor as well).

@fperrin
Copy link
Author

fperrin commented Oct 8, 2018

The only potential issue I can see with this is that someone could DoS pyflakes by creating a very large string using float interpolation. For instance, "%1000000000000f" % 1 will create a string that's 1 terabyte in size. Hopefully this isn't difficult to workaround.

I didn't think of that, I thought evaluating the string interpolation to be safe because there is no arbitrary Python code (as opposed to eg. f"" strings). I guess it could even be argued that your example would run fine on beefy production environment, so pyflakes should not fail when running on the developer's smaller laptop. I'll try to think more about it.

There is another bug in number formatting: "%*f" % (5, 1.2) fails with * wants int. Looking into this Pushed new commit.

@fperrin
Copy link
Author

fperrin commented Feb 25, 2019

Hi,

I don't have any bright idea for avoiding the unbounded memory use. It's an intrinsic flaw of asking Python to evaluate the string. I think I'll close this PR, because it'll not go anywhere I'm afraid.

@asottile
Copy link
Member

Instead of evaluating the format string, you could parse it and compare the fields. There's a stdlib parser at least for .format(...). Here's how I use it for pyupgrade

Parsing % format is a little trickier, though I think I have that as well: pyupgrade again

I don't think there's anything necessary for f-strings since they have their own ast representation, though if you wanted a parser for those I've ported the C parser to pure python here: future-fstrings

@fperrin
Copy link
Author

fperrin commented Aug 2, 2019

Closed as it is superseded by #443, thanks @asottile !

#443 found a couple extra issues on our code base, where "{} {}".format(1, 2, 3) has more (unnamed) arguments than are used in the template. That case was missed by the present PR. So that makes #443 plainly better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect broken string templates that will raise ValueError when interpolated
4 participants