Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A systematic way to generate rules, and possible enhancements #35

Closed
gousaiyang opened this issue Jan 11, 2020 · 13 comments
Closed

A systematic way to generate rules, and possible enhancements #35

gousaiyang opened this issue Jan 11, 2020 · 13 comments

Comments

@gousaiyang
Copy link
Contributor

Hello! Recently my friend (@JarryShaw) and I found that a lot of rules are missing in vermin. So I wrote a small tool to parse Python documentation to find New in version ... and Changed in version ... indicators. Then the information can be extracted and turned into vermin rules in a semi-automated fashion (with manual inspection).

Based on our result, I am adding a lot of rules in PR #34

Also, we find that these language features could also be detected, if you are interested in implementing them:

If you are interested in finding rules systematically, you can possibly refer to our results to get more information.

Note: some drawbacks of parsing the documentation:

  • Some changes may be undocumented, or expressed vaguely in the documentation (string parsing is error prone, need manual verification)
  • The major differences of Python 2 vs Python 3 grammar is not clearly documented in the official documentation, more rules might be needed regarding this
@netromdk
Copy link
Owner

netromdk commented Jan 11, 2020

Thanks for opening the issue. Yes, it's been an ongoing process for me to add the rules and implement language detection functionality. While I've thought about it many times, I never got around to making my own pydoc parser. Really cool that you did, though! Seems to be working pretty well from reviewing the rules in #34 (thanks again for that btw).

I'm definitely interesting in implementing language features. Adding them to my To-Do.

What's your plan with your tool? You'll keep maintaining it for Vermin rules generation? It would be interesting to try to find and fix possible border case scenarios with different wordings for the same things. That HTML output viewer is really neat and makes for easy overview.

@netromdk
Copy link
Owner

3.3 u prefix on string

Unfortunately, I cannot detect this because the information is lost since every string is unicode in py3. The AST of u"value" is:

Module(body=[Expr(value=Str(s='value'))])

@netromdk
Copy link
Owner

yield from is already supported. For instance:

L3: `yield from` requires 3.3+

@netromdk
Copy link
Owner

3.6 underscores are now allowed for grouping purposes in literals (int, float, complex) (e.g. 123_456)

Unfortunately, I cannot detect those underscores either. AST of 123_456 is for instance:

Module(body=[Expr(value=Num(n=123456))])

@JarryShaw
Copy link

Do you consider introducing dependency to vermin to support this kind of literal analysis?

@gousaiyang and I are working on a Python backport compiler project (we call it babel at the moment). Currently, we have f2format for f-string, poseur for positional-only argument and walrus for assignment expression; and we're planning to include vermin in the scope.

For f2format, poseur and walrus, all three projects are based on @davidhalter's Python parser parso, which granted them the ability to access the original code.

@netromdk
Copy link
Owner

(I'm assuming that question was for @gousaiyang?)

That's a cool project, @JarryShaw :)

netromdk added a commit that referenced this issue Jan 11, 2020
@gousaiyang
Copy link
Contributor Author

As for my tool, rule generation is mostly a one-time thing (i.e. we will not go over the whole process of rule generation of Python up to version 3.8 again, we just fix errors if we find some). When a new feature release of Python come out (e.g. Python 3.9), I will just run my tool against Python 3.9 documentation and filter only changes in 3.9 and process them, which will be much less work than this time. And I will try to improve and fix bugs in my tool.

@netromdk
Copy link
Owner

Sounds great. 👍

@JarryShaw
Copy link

(I'm assuming that question was for @gousaiyang?)

That's a cool project, @JarryShaw :)

Nah… I’m just proposing a possible solution to resolve the cases that vermin currently cannot process :)

@netromdk
Copy link
Owner

Which cases are you referring to? I have already implemented detection of f-strings, positional-only arguments, and assignment expressions, if that's what you meant? :)

@JarryShaw
Copy link

JarryShaw commented Jan 12, 2020

Cases as @gousaiyang was talking about and you found them unsupportable due to loss of information:

3.3 u prefix on string
3.6 underscores are now allowed for grouping purposes in literals (int, float, complex) (e.g. 123_456)
etc.

Since I used to implement f2format based on ast, I do understand that this standard module always provides the optimised results of sources’ AST. So, just wondering if you find it a better idea to rely on some other parsers. ;)

@netromdk
Copy link
Owner

It's a good idea to maybe supplement with another parser but I'm afraid it would hit performance a lot as well as adding complexity. The reason for using Python's ast is that it's always up-to-date, maintained, and correct. It's also available out-of-the-box as vanilla Python so no extra packages are required. But I will think about it, though. Thanks. :)

@netromdk
Copy link
Owner

I'm going to close this issue now. If you guys come up with anything concrete, you can open a new issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants