A systematic way to generate rules, and possible enhancements #35

gousaiyang · 2020-01-11T02:41:15Z

Hello! Recently my friend (@JarryShaw) and I found that a lot of rules are missing in vermin. So I wrote a small tool to parse Python documentation to find New in version ... and Changed in version ... indicators. Then the information can be extracted and turned into vermin rules in a semi-automated fashion (with manual inspection).

Based on our result, I am adding a lot of rules in PR #34

Also, we find that these language features could also be detected, if you are interested in implementing them:

Detect context manager support of objects (e.g. starting from Python 3.2, a TarFile object can be used as a context manager in a with statement)
3.5 printf-style bytes formatting (e.g. b"%x" % val)
3.7 __getattr__ and __dir__ module attributes (PEP 562)
3.5 iterable unpacking in expression lists, originally proposed by PEP 448
3.3 yield from
3.3 u prefix on string
3.6 underscores are now allowed for grouping purposes in literals (int, float, complex) (e.g. 123_456)
and possibly much more...

If you are interested in finding rules systematically, you can possibly refer to our results to get more information.

Note: some drawbacks of parsing the documentation:

Some changes may be undocumented, or expressed vaguely in the documentation (string parsing is error prone, need manual verification)
The major differences of Python 2 vs Python 3 grammar is not clearly documented in the official documentation, more rules might be needed regarding this

The text was updated successfully, but these errors were encountered:

netromdk · 2020-01-11T14:07:03Z

Thanks for opening the issue. Yes, it's been an ongoing process for me to add the rules and implement language detection functionality. While I've thought about it many times, I never got around to making my own pydoc parser. Really cool that you did, though! Seems to be working pretty well from reviewing the rules in #34 (thanks again for that btw).

I'm definitely interesting in implementing language features. Adding them to my To-Do.

What's your plan with your tool? You'll keep maintaining it for Vermin rules generation? It would be interesting to try to find and fix possible border case scenarios with different wordings for the same things. That HTML output viewer is really neat and makes for easy overview.

netromdk · 2020-01-11T14:24:24Z

3.3 u prefix on string

Unfortunately, I cannot detect this because the information is lost since every string is unicode in py3. The AST of u"value" is:

Module(body=[Expr(value=Str(s='value'))])

netromdk · 2020-01-11T14:25:43Z

yield from is already supported. For instance:

L3: `yield from` requires 3.3+

netromdk · 2020-01-11T14:28:20Z

3.6 underscores are now allowed for grouping purposes in literals (int, float, complex) (e.g. 123_456)

Unfortunately, I cannot detect those underscores either. AST of 123_456 is for instance:

Module(body=[Expr(value=Num(n=123456))])

JarryShaw · 2020-01-11T14:58:18Z

Do you consider introducing dependency to vermin to support this kind of literal analysis?

@gousaiyang and I are working on a Python backport compiler project (we call it babel at the moment). Currently, we have f2format for f-string, poseur for positional-only argument and walrus for assignment expression; and we're planning to include vermin in the scope.

For f2format, poseur and walrus, all three projects are based on @davidhalter's Python parser parso, which granted them the ability to access the original code.

netromdk · 2020-01-11T16:42:32Z

(I'm assuming that question was for @gousaiyang?)

That's a cool project, @JarryShaw :)

Proposed in #35. Ref: https://www.python.org/dev/peps/pep-0448/

gousaiyang · 2020-01-11T19:10:06Z

As for my tool, rule generation is mostly a one-time thing (i.e. we will not go over the whole process of rule generation of Python up to version 3.8 again, we just fix errors if we find some). When a new feature release of Python come out (e.g. Python 3.9), I will just run my tool against Python 3.9 documentation and filter only changes in 3.9 and process them, which will be much less work than this time. And I will try to improve and fix bugs in my tool.

netromdk · 2020-01-11T21:55:06Z

Sounds great. 👍

JarryShaw · 2020-01-11T23:25:15Z

(I'm assuming that question was for @gousaiyang?)

That's a cool project, @JarryShaw :)

Nah… I’m just proposing a possible solution to resolve the cases that vermin currently cannot process :)

netromdk · 2020-01-11T23:46:42Z

Which cases are you referring to? I have already implemented detection of f-strings, positional-only arguments, and assignment expressions, if that's what you meant? :)

JarryShaw · 2020-01-12T00:55:07Z

Cases as @gousaiyang was talking about and you found them unsupportable due to loss of information:

3.3 u prefix on string
3.6 underscores are now allowed for grouping purposes in literals (int, float, complex) (e.g. 123_456)
etc.

Since I used to implement f2format based on ast, I do understand that this standard module always provides the optimised results of sources’ AST. So, just wondering if you find it a better idea to rely on some other parsers. ;)

netromdk · 2020-01-12T09:44:47Z

It's a good idea to maybe supplement with another parser but I'm afraid it would hit performance a lot as well as adding complexity. The reason for using Python's ast is that it's always up-to-date, maintained, and correct. It's also available out-of-the-box as vanilla Python so no extra packages are required. But I will think about it, though. Thanks. :)

Proposed in #35. Ref: https://www.python.org/dev/peps/pep-0461/

netromdk · 2020-01-28T21:14:46Z

I'm going to close this issue now. If you guys come up with anything concrete, you can open a new issue. Thanks!

netromdk added a commit that referenced this issue Jan 11, 2020

Detect generalized unpacking

0b94b58

Proposed in #35. Ref: https://www.python.org/dev/peps/pep-0448/

netromdk added a commit that referenced this issue Jan 12, 2020

Detect % formatting and directives for bytes and bytearray

0444434

Proposed in #35. Ref: https://www.python.org/dev/peps/pep-0461/

netromdk mentioned this issue Jan 18, 2020

Incorrect detection of self-documenting f-strings #39

Closed

netromdk closed this as completed Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A systematic way to generate rules, and possible enhancements #35

A systematic way to generate rules, and possible enhancements #35

gousaiyang commented Jan 11, 2020

netromdk commented Jan 11, 2020 •

edited

Loading

netromdk commented Jan 11, 2020

netromdk commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 11, 2020

netromdk commented Jan 11, 2020

gousaiyang commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 12, 2020 •

edited

Loading

netromdk commented Jan 12, 2020

netromdk commented Jan 28, 2020

A systematic way to generate rules, and possible enhancements #35

A systematic way to generate rules, and possible enhancements #35

Comments

gousaiyang commented Jan 11, 2020

netromdk commented Jan 11, 2020 • edited Loading

netromdk commented Jan 11, 2020

netromdk commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 11, 2020

netromdk commented Jan 11, 2020

gousaiyang commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 11, 2020

netromdk commented Jan 11, 2020

JarryShaw commented Jan 12, 2020 • edited Loading

netromdk commented Jan 12, 2020

netromdk commented Jan 28, 2020

netromdk commented Jan 11, 2020 •

edited

Loading

JarryShaw commented Jan 12, 2020 •

edited

Loading