Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Cython for performance a 'good idea'? #2405

Closed
kodonnell opened this issue Aug 13, 2018 · 9 comments
Closed

Is Cython for performance a 'good idea'? #2405

kodonnell opened this issue Aug 13, 2018 · 9 comments

Comments

@kodonnell
Copy link

Is using Cython to enhance the performance of pylint (and astroid) a 'good idea'? Specifically:

  • Would the maintainers simply not be interested? Is Cython unpopular because e.g. it makes it harder to distribute? Or harder to maintain and contribute to?
  • Is it 'likely' to result in significant speedup? From my relatively minimal experience, I'd expect so - probably an order of magnitude without going too crazy. However, there may be some things that make it hard to Cythonize fully (e.g. yields)?
  • Are there other 'better' ways to get a similar speedup? (E.g. caching, etc.)

The idea would be we're in a happy medium where some core performance critical code is Cython (with maybe a bit of C), while the bulk of the 'business logic' for different linting rules etc. are still in python (as these are the parts most likely to be contributed by the community etc.?). Then again, some of these tried-and-true ones could be cythonized too, etc.

@PCManticore
Copy link
Contributor

@kodonnell Thanks for initiating this discussion.
pylint and (astroid) would definitely be open to use Cython for improving the performance, but a major problem to this initiative is not necessarily the fact that it's going to be harder to distribute, but the fact that we simply won't have time to rewrite the core of pylint or astroid to use Cython. This would be a pretty involved project that will probably eat our OSS time for a couple of months at least, so at least I can't really get involved in this at this moment.

Other than that, I would say the most of the pylint's performance problems are actually coming from astroid. pylint is rather simple, but astroid is where the magic happen and where it's 15 years of existence is showing up its age. One problem right now with astroid is its inference capabilities, which is triggered across the board for almost every linting rule. The way the inference is implemented is that it goes from a node and tries to execute that code statically until it finds a solution, so if pylint finds a function call, it will try to see what that function call is going to return by walking through all the potential inference paths of that function call. This is pretty intensive in reality just to figure out the type of the function call, which should most likely be handled at a different level by inferring type annotations instead. There's also some caching involved there, but it's not working right all the time. So ideally we'd:

  • support type annotation and stop early if we can infer the types of the nodes easily
  • improve the inference caching
  • eventually write the inference in something as Cython

@PCManticore
Copy link
Contributor

Also I'm closing this not necessarily because I'm opposed to the idea, but we don't have as much time as this effort requires.

@kodonnell
Copy link
Author

Thanks @PCManticore, and I understand where you're coming from. It's good to have this documented for others (e.g. @nickdrozd who mentioned this possibility in his PRs, if I remember correctly).

Question: if somebody (maybe me or maybe someone who reads this) did a proof-of-concept cythonizing of astroid, and the results were promising (e.g. minimal code change but performance is doubled) would that change the situation much? I guess what I'm really asking is whether there's much point in me investigating this further. Another alternative (which may be cleaner) could be for someone to start a Cython fork of astroid and, if it's up-to-scratch, simply allow pylint to use either one. In other words "PyCQA will maintain and support pylint with astroid. We allow for the use of in pylint, but it's at your own risk and not supported" etc. That could also allow you to gauge how much interest there is, etc.

Re type annotation - that wouldn't have any impact for me (or most of the open source projects I see which don't seem to use it - though that may change).

@PCManticore
Copy link
Contributor

@kodonnell yes, definitely. If you or anyone else will come up with a Cythonized astroid fork that has improved performance, we'll be definitely keen on integrating that into astroid itself. I'd love a faster pylint, I just cannot commit myself on working on this project.

@kodonnell
Copy link
Author

Understood. No promises here, in part as I need to wrap my head around AST and astroid etc. But, it might be fun. And kind of neat in how many people it could benefit. If someone reading this does plan to start such a cython fork, can you add a comment below, so we can keep an eye on progress (if any)? If you're just a general fan of the idea, but not willing to fork yourself, maybe just a thumbsup, so we can track interest?

As an aside, I wonder how much power pylint/astroid consumes in all its uses worldwide? (And hence, how much performance improvements matter.) That'd be a fun calculation.

@kodonnell
Copy link
Author

Quick update @PCManticore - I've done a quick hack to just cythonize the plain python - note that very little code has changed. If you want to run it: python setup.py build_ext --inplace.

Note that I couldn't 'naively' apply this to some of the files (seemingly inference.py and raw_building.py if I remember rightly) due to the dynamic assignment of methods not working out-of-box in cython - see here. Nonetheless, when running it on pycodestyle.py the runtime is now about 5.8s instead of 6.2s (both after a bit of warmup), which is pretty nice for 'free'. Also, given you say inferencing is the slow part, I should expect more gains once I get inference.py cythonized? (Which doesn't appear too hard to fix, but I need to go to bed.) NB: tests are failing, but I'm hoping that's just related to similar cython foibles. I don't think it was shortcutting the execution (hence faster timing) given it gave me the same score in both methods.

Also - do you want me to move this issue to astroid?

@PCManticore
Copy link
Contributor

That sounds amazing @kodonnell ! Yes, let's move this to astroid, we can then figure out what needs to be done to package, ship and test this.

@kodonnell
Copy link
Author

Great, I'll start a WIP PR or something this weekend.

It's worth noting that this same sort of naive cythonization could be applied to pylint - though my understanding is that astroid's the main bottleneck, so it might not be worth it.

@kodonnell
Copy link
Author

@PCManticore do you think this should be re-opened given the results above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants