-
-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a unified tech stack for jedi, rope, RedBaron, yapf, pylint, pep8, and similar tools and libraries #630
Comments
I feel the smell of Architecture Astronauts here. You quote python-rope/rope#57 but you have apparently missed my comment, especially part “The problem is that we don't have REAL maintainer.” Rope project so far was run pretty tightly on the rule Find the dependencies -- and eliminate them. and I am afraid we cannot move much from that rule. So, generally I think it is a good idea (reuse is after all The RIght Thing™), but I won’t spend a second on implementing it. If I get a pull request to https://github.com/python-rope/rope/ which will do the job, and it will pass our whole testsuite on my computer (with RHEL-6 and thus python 2.6) then I may start consider the idea. |
@gotgenes Thank you for bringing this up! I think collaboration is a nice thing! However, collaboration needs at least two parties to have the time to collaborate (not even talking about willingness). That's where I see the big problem. I have personally talked to the pylint core devs a year ago. They were interested to use the Jedi parser (and some more business logic). It didn't happen. Why? Because they didn't have the time. Using a different parser is a huge task that requires fundamental changes to your core. This again entails changes in other code. Believe me I have done it with Jedi itself. It's a pain. What I'm saying really is: If someone is interested to port, I think it makes sense. However, you need time and motivation. A lot of it. It's not paid work. So likely it won't happen for a very long time, except if you do it?! :) @mcepl I really liked the Architecture Astronauts read. Good stuff :) |
@mcepl I see that rope is a refactoring library that could use a code representation that works for Python 3 (and Python 2), and jedi is a library with a code representation for Python 3 (and Python 2) that is in search of a refactoring library. So maybe the two projects could help each other. Or maybe not.
Understood. Like you said, Rope no longer has an active maintainer, and Python 3 support is still up in the air. jedi is not a heavy dependency. If it is too heavy for rope, as I mentioned/suggested above, the syntax representation library could be broken out into its own Python distribution package, making the dependency even lighter. Maybe jedi or the spinoff library could make rope easier to work with and maintain. Or maybe not.
This is quite fair. Other ways exist to contribute besides implementation, though. Your insight on rope you could help provide a requirements necessary for a syntax representation usable by rope. @davidhalter, thanks for your reply. You said
Sorry it didn't pan out then. Is there any archive of this discussion that could inform of their requirements? Maybe this won't benefit pylint, but it will check the usability of the syntax representation provided by jedi. Was there anything else they found lacking, beyond time and effort? |
No, it happened offline, at EuroPython.
Not really. We haven't even talked about this that much, because it doesn't really matter if you don't even have the time to clear all the details. |
Hello, On my side things are quite simple : I would be really happy to see this happening because I don't care about maintaining my own parser, I only wrote it because I didn't knew the existence of lib2to3 at that time and I'm interested in high level tools (regarding custom refactoring and projectional editing), not parsers. One of my middle term goal is to drop my parser and to switch to either jedi one or lib2to3 one combined with an adapter but I have now idea on when this will happen since I'm currently stuck in advanced refactoring and debugging (and other million of things in my life). On my sides, my needs are for a python syntax tree are:
Big bonus:
Jedi is the closest tool I know to those needs and adding the needed code to match them seems doable. On a broader view, I really believe that python will greatly profit from having a tool of reference for those tasks, especially if it's well documented and accessible but I don't have the set of skills needed to lead this project nor the time like most of you. Thanks everyone for all your work :) |
For the YAPF project, we've run into a lot of issues having to do with the So, in general, YAPF would love to have a new parser, but as others mention it's getting the time to help with the project that's a problem. And as mentioned before, we built YAPF around All of that said, if I can help with the project I will try. I find such a goal very appealing. |
A problem I found with Jedi's ast (and I admit that I didn't play with it for very long, so there may already be a way around this) is that because it stores whitespace information (in order to be round-trippable), it's more challenging to do refactoring or anything that involves creating ast nodes and injecting them into the tree. I played around with this once and wasn't very satisfied with how things worked. For instance, it was hard to inject a node into a loop body because you'd have to figure out how to indent it. To me, for a nice roundtripping AST library this would "just work". Another nice thing to have would be for the library to be able to handle syntax for any (supported) version of Python, even ones other than the one that the code is running in. A common source of invalid bug reports to pyflakes comes from people complaining about syntax errors because they ran it in Python 2 and the code only works in Python 3 (or visa versa). |
I agree. There's a lot of "utility" functions missing right now. But it's not like we couldn't improve that. Jedi has a very distinct set of usages that doesn't include refactoring, yet. So therefore we have not implemented such things.
What does that mean?
I think this is not something Jedi would do, because it's not the way how the grammar file works. However at the same time it's debatable.
Hmm, can you give more examples than comments? |
@asmeurer @davidhalter This is exactly the kind of situation I have solved/I'm solving in RedBaron (which is a high level api to refactor code while not having to take care about low level stuff). See those slides https://psycojoker.github.io/fosdem-redbaron/presentation.html#slide35 or the related documentation http://redbaron.readthedocs.org/en/latest/modifying.html#code-block-modifications http://redbaron.readthedocs.org/en/latest/proxy_list.html Disclaimer: making it work in a generic and expected way if fucking hard. I might write some text on my algos one day (they aren't really clever, just super super annoying to write and debug) but they aren't all stable right now.
It's kinda hard to describe this well with text, if I fail again I'll do some schema :/ Let's use an example. In [1]: from baron.helpers import show
In [2]: show("a = 1 + 2")
[
{
"first_formatting": [
{
"type": "space",
"value": " "
}
],
"target": {
"type": "name",
"value": "a"
},
"value": {
"first_formatting": [
{
"type": "space",
"value": " "
}
],
"value": "+",
"second_formatting": [
{
"type": "space",
"value": " "
}
],
"second": {
"section": "number",
"type": "int",
"value": "2"
},
"type": "binary_operator",
"first": {
"section": "number",
"type": "int",
"value": "1"
}
},
"second_formatting": [
{
"type": "space",
"value": " "
}
],
"operator": "",
"type": "assignment"
}
] Here you can see an assignment node combined with a binary_operator node. If you look at "first_formatting" and "second_formatting" of the first level (in the first dictionary), those are the formatting around the "=", those are the formatting of the assignment node because they are inside the assignment node. On the lib2to3 side (and if my understanding of it is good), the space after the "=" is handle by "1" because it is before "1" (and the logic is the same for the other formatting informations). I have made this choice in baron (while it's more complicated to handle) because this allows me to resonate about nodes as independents self-contained units that can therefore be extracted and move around without any problems, while in a lib2to3-like situation this would have been way more complicated and full of special cases. I hope that my explanation makes sense, don't hesitate to tell it if it's not the case :)
I would totally understand that you wouldn't want to do that. On my side I'm thinking that being intuitive and easier to use prevail other this kind of technical limitation since they can be fixed (my goal is to make the task "writing code that modify source code" as easy and as realistic as possible). |
RedBaron looks like a very nice abstraction. I'll have to play around with it. I admit I'd be more excited about it if it were BSD licensed. How is the performance? |
Also, how does it handle partial AST (like |
Not very good in comparison of other tools to be honest :( (but totally okay for live refactoring like here (video is in french but you should have an idea from the code I'm writing). I haven't spend any time at all making optimisation. PYP helps in big jobs.
It doesn't. Moving to the jedi parser could both solve this problem and improve performances (and brings static analysis). RedBaron is also in alpha, expect bugs but you can already do real work with it (people have already do so). Those are the reason why I haven't make that much advertisement about RedBaron. |
I can understand that. It's good for a refactoring library. I might even change that as well - haven't thought about it a lot.
I have intentionally not done anything like this, because it would use more space. I have spent quite a bit of time optimizing the space used in Jedi (of the parser). However, something like this would still be possible with helper functions IMO. |
In general, I'm all for consolidating several different ways of doing something together. I've not done much with AST parsing, but I would be willing to help out a bit. That said, one of the stated goals of pep8 (which I've jumped into maintaining as of about a year ago) is to be a single distributable file that only relies on the Python standard library. Currently the code works by parsing file line by line to do the linting. That said, perhaps that requirement could go away... Not willing to make that call right at the moment though. :) cc @sigmavirus24 as the author / maintainer of flake8. |
So, I've just now had the chance to really dig into this issue and read the thread (thanks for pinging me @IanLee1521, although I'm merely the maintainer of flake8). Flake8 has three core dependencies: pep8, pyflakes, and mccabe In general, pep8 avoids using the AST because it can be slow and cumbersome to parse. PyFlakes, however, (cc @myint and @bitglue) is almost entirely reliant on AST parsing and traversal. Further, mccabe uses AST parsing and traversal to attempt to calculate McCabe complexity values for functions and other blocks of code. I'll leave PyFlakes' decision up to them, but speaking for mccabe (as the sole active maintainer) I'm not certain I would immediately have time to switch to a new engine. The other thing is that the beauty of each of Flake8's dependencies, is that none of them in turn have dependencies, so it's rather hard to break Flake8 by installing a new version of some transitive dependency. I quite like this and adding a new dependency for two of the three could mean nightmarish complexity for me as the sole maintainer of Flake8. Speaking from a position of experience with pyflakes and mccabe, neither need the AST to be round-trip-able, that said, it probably wouldn't hurt us if it were. Speaking as a core reviewer for a different tool, bandit, I think this might be something other cores might be interested (cc @chair6 @tmcpeak @callidus @ericwb in no particular order). I'm not confident that Bandit doesn't need round-trip-able AST but again it certainly wouldn't hurt. And I know that our core team is slightly adverse to adding too many dependencies to the tool. That said, I think all the projects I'm involved with would be comfortable adding a dependency on an extremely stable version of the software. What does stability mean:
All that said, I'd be happy to host this in the PyCQA organization and add as many people as are interested in working on this. All projects are welcome there and I'm happy to make the team much larger. |
Speaking for pyflakes, it's been working for something like a decade without really any major changes except to support new language features as they are released. It does one thing and it does it very well. It's gone very far on the philosophy that it should not try to be more clever than the developer. What would be the benefit to switching the underlying parsing library? Would it be faster? More accurate? |
@bitglue I agree. It doesn't make any sense to switch for you. My idea was more into the direction of pylint. I like the simplicity of pyflakes and I would not want to complicate it. That said, somewhere in the distant future it could be interesting to switch, because once it's really mature it could provide you with partial file parsing, which would make repeated pyflakes checks much faster. @sigmavirus24 I agree with your list of requirements. However, I think that it's going to be hard to find enough people that are dedicated towards a common parser. For example I have very unique requirements in Jedi. Jedi's parser needs to be able to do error recovery, partial file parsing (caching parts of the file and not reparsing it) and round-tripping needs to be possible. My idea was never to combine all those tool at the moment - it would be nice - but probably a huge amount of work with a lot of annoyed people. For now I would mostly keep things as they are and try to start combining one or two projects that really need a new parser. I think the Jedi parser fits some projects very well, but for others it's just too complicated. |
I'd also be interested in this for PyScript (a Python to JS transpiler). I actually just implemented a module that generates a consistent AST tree for different Python versions by making use of Python's ast module and converting it to a common format. Some info on that here: https://github.com/almarklein/commonast My needs are mostly consistency and performance, though having something that works in pure Python could open some awesome doors. What I have now serves my needs, and I won't have time to work on something like is proposed here, though I would be interested in adobting it if it happens. |
python-modernize is another project which may be able to benefit from this. It needs roundtripping and the ability to parse code from other Python versions. It's currently built on top of lib2to3, but there are various annoyances that are harder than they should be to fix because the code responsible is in 2to3. However, as some other people have mentioned, I'm not sure that we have the time/energy to move it to a new parser (I'm not volunteering; cc @daira). By its nature, modernize doesn't really have many regular users willing to invest time in significant changes to it. I think the most plausible route for python-modernize would be a fork of 2to3 which could be gradually improved. @edschofield's Python-Future includes the futurize and pasteurize code-rewriting tools, which are also based on 2to3, so he may also be interested in this. If we can get a concrete proposal together (~what we want to build, what projects can use it), I think this is exactly the kind of infrastructure that the PSF might sponsor someone to work on. That might make life easier for busy maintainers. |
I know it's been a while since I raised this issue. I have attempted to compile all the concrete feature requirements for a common syntax tree that have been listed by the various parties, and I have tried to attribute respective parties to each feature requirement. If I have omitted a requirement or notation of the party affiliated with that requirement, my apologies; please note the omission and I will update the list. My hope is that we can look at this list of requirements and see the common threads, or reason if any of them are mutually exclusive and what would be a compromise. I'm also hoping this will rekindle interest.
|
For use cases where it's important to parse Python code (a) across different Python versions and (b) quickly, the typed_ast project may be a good choice. This is work that @ddfisher did quite recently -- it's new since this thread started -- for Mypy, which is a typechecker for PEP 484 static types. The standard library's
The trouble with it from our perspective was twofold:
So we borrowed the CPython parser -- forked it, effectively -- and fixed those issues:
Because Python's syntax is quite stable and so is CPython's parser -- a new version every year or two, generally with modest changes -- we're quite comfortable with maintaining this "fork" as Python 3.6, 3.7, and so on come out with new syntactical features. At present It's unlikely Although |
Sidenote, I saw recently that the PyCQA (Python COde Qualilty Authority) was created, and a few project cited above have moved there. |
@gnprice Thanks for contributing that information about typed_ast to the discussion; that's very helpful as I was unaware of that project. Sorry for the omission. @Carreau, Good call! I also recently stumbled upon the PyCQA. I would think this discussion aligns very closely with their goals. @IanLee1521, apparently you are a member of this organization. Have you raised this to PyCQA's attention? |
@gotgenes I'm the founder of the PyCQA and on this thread and watching. This is something interesting to me also as the maintainer of Flake8 and a core contributor to pyflakes. typed_ast is something I had heard about happening and am excited to experiment with. Specifically it will allow flake8 to silence warnings when someone is using the typing module on Python 2.7 with the magic |
@gotgenes Nothing to apologize for! |
I'm wondering if with the number of people involved we shouldn't try to literally find some time at a conference where most of us are there and make a BOF / sprint / something similar. |
@Carreau I'd much rather a virtual sprint/BOF instead. I don't go to many conferences. |
@gotgenes -- I am a member (pycodestyle), and I think that if you're asking the question of "could such a package live in that organization" I suspect @sigmavirus24 would agree and we could host it there. @Carreau -- We in the PyCQA did a couple of smaller open space gatherings at PyCon last month that were fairly successful, I agree that it would be nice to at least not limit to only in person though, as I also probably won't be at any other Python conferences this year. :) |
I'm not sure which package we're talking about, but the PyCQA aims to be welcoming to new projects and members. I made it so we could work towards reducing the bus-factor on the projects involved because most of them are single-person operations (with the exception of Pylint). |
@sigmavirus24 Sorry for my oversight, I missed that you were also on this thread. It sounds from your comments and @IanLee1521 that we could (or even should) move this to something under PyCQA. Let me know if I can help with that, including consolidating what's been said here into another location. |
@gotgenes no worries. I'd still like clarification around what we're moving to the PyCQA, but yeah. I think we can move some kind of unified AST under-library for these efforts there if the authors/contributors/maintainers are cool with it. |
Just to update you all: I'm currently fixing a lot of small issues that I've had with the Jedi parser. I might be able to publish it in two months or so. I definitely think that this parser would provide a lot of value to certain Python projects. |
Hi all, I'm the owner of I'm not sure if I can commit to maintain a parsing library (my wife is expected to be giving birth any day now, so I'll have very little time), but I am definitely going to spend the time and replace |
hello @Nurdok, Yes (red)baron performances are not good mostly because I've been focusing my efforts on making it easy to do things that were (very) hard/way too annoying to do before with a nice API. I haven't done any work on performances and that's for later on my todo list (but I won't block contributions in that direction as long as they don't reduce too much code maintainability). For your situation I would better look at either jedi or lib2to3 parser since they keep formatting information (but not the same way than (red)baron: here a token is responsible for the formatting behind him (or after, I have a doubt)) and are very fast. Be aware that lib2to3 is known to have some bugs: PyCQA/baron#61 (comment) (and there is a bug about that around on some tool that use it but I can't find it anymore). I don't have information regarding the state of jedi. Cheers, |
Just let me mention here https://github.com/Microsoft/language-server-protocol ... it seems that big boys are now uniting under it. |
This thread is pretty powerful stuff. Just wanted to check in and see what the current status of de-duping parser/ast backends is. Is Jedi the project with the most active development in making lib2to3 palatable? |
Probably. Jedi also adds some other stuff to it. The problem is at the moment that I need to figure out a good way to decouple that stuff. But I'm pretty far, I won't make any promises again, but it's getting better and better. I'd like to keep the parser as generalized as possible to allow parsing other languages as well. |
Also: the API is just not where it will be. The API is quite weird and complicated and will be way easier. |
I know it's tough to give time estimates, but do you have any idea when you
expect the API to stabilize? Trying to plan for my project. A roadmap would
be helpful, because I could possibly help contribute if I know what work is
left to be done.
…On Sun, Mar 12, 2017 at 3:41 PM Dave Halter ***@***.***> wrote:
Also: the API is just not where it will be. The API is quite weird and
complicated and will be way easier.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#630 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJ4j6HKTInOZx15Kq6xKNb-Gw5hR_Ecks5rlEpLgaJpZM4GIRuB>
.
|
I have started a separate discussion about the progress of the parser. #895 I think this is a first working prototype that should still be expanded and better documented. But for most use cases it's already better then what you can get elsewhere. Let's also talk your concerns for a roadmap there. I would also need a timeframe like when you would need what feature. |
I am working on using Jedi's parser as the underlying parser for pydocstyle in PyCQA/pydocstyle#240. It seems like the API is currently undocumented and #895 might change this. Should I wait until the parser is released separately? |
I'm currently creating an API that all of you guys can use, it's in progress and the API will not change a lot anymore. I would actually appreciate if you play with it and tell me what's wrong. Most things are fixed anyway, because Jedi uses it. Just use #895 for feedback. But it will take some time until it is released. Probably this summer. |
Hi everyone It's finally done. https://github.com/davidhalter/parso Features include:
~ Dave |
I’ve moved my mutation tester mutmut to parso from baron. It was nontrivial but more boring and annoying than hard. In fact the resultant code is actually simpler! And now I have Python 3 support which I am very happy about. Thanks for your work on this lib! |
Currently there exists duplication of effort around tooling for parsing, automated manipulation, and analyzing Python code. Each project in this space faces similar fundamental problems of requiring a data representation of the Python code of interest, most commonly represented as a syntax tree. Here is a list, by no means exhaustive, of tools and libraries that face the problem of needing a representation of Python code capable of inspection and possibly even manipulation:
ast
lib2to3
2to3
toolThe overlap between any of these projects is by no means complete, but the overlap between all of them is still significant, especially given they solve the same underlying problem of Python code representation. My hope here is we can examine whether the Python ecosystem really needs this many implementations of the Python syntax tree, and whether we can arrive at an ecosystem that has a unified base, with the diversity of effort occurring on more interesting problem spaces such as advanced refactoring and static analyses.
I am hoping to start the discussion here in jedi's GitHub Issues because of the project's recent announcement of a representation of Python code based upon an (improved?) fork of
lib2to3
, which, from my understanding should provide a usable syntax tree representation for both Python 2 and Python 3.One question is whether the jedi library's API is wide enough to support projects beyond jedi, and, if it's not, what additions must be made to make it more universal? We must acknowledge that each tool and library mentioned in the list above may only need some portion of information from the syntax tree, but that a good underlying representation would have the union of such information required by all client tools and libraries. Clearly some requirements gathering must be done, but my hope is there is a strong and reconcilable overlap in requirements between the projects.
Another question is, should this library be broken out into its own Python distribution package? This would allow the fundamental underlying library to move at its own pace independent of jedi's release schedule. This means jedi would become a client of library, as I hope other libraries and tools could.
It also seems that the base representation, in being an accurate representation of Python code, may not be the most convenient or appropriate interface through which a client should interact with the code, suggesting room for higher level APIs, provided by separate libraries that wrap the base representation, for example, how pandas is to numpy. I think there exists the opportunity to create a library stack, in which the tools mentioned above become clients of the high-level libraries mentioned above, which in turn are clients of the base representation library. In fact, we can see discussion of these very ideas already within and between multiple projects, e.g., python-rope/rope#57, and PyCQA/baron#61. I think now is the time to get serious about these efforts.
Finally, I'd like to point to two inspirations that this collaboration of effort and unification can happen. The first inspiration is the scientific Python community's unification around the HDF5 Python stack, which I encourage you to read about in @scopatz's excellent blog post about that effort. The second inspiration comes from @royrapoport's description of the way Netflix works in his recent interview on Talk Python To Me. There, teams are permitted to make their own engineering decisions, but this inevitably creates duplication of effort. To counteract that duplication, Netflix periodically evaluates overlapping projects, and if their duplication is found technically unjustifiable, the project's maintainers must sit down with each other and figure out how to merge the code bases. While there's no fiduciary incentive in the case of the projects I've mentioned above here, there is still the incentive of reducing the amount of time, personally and collectively, spent solving very similar problems.
I think a similar effort here to unify would be a great win for Python tooling and the Python community. My hope is this opens up the conversation between these various projects.
cc @davidhalter @aligrudi @mcepl @Psycojoker @gwelymernans @hayd @jcrocholl @florentx @IanLee1521 @PCManticore
I will also try to reach out to maintainers of other projects whom I could not find on GitHub, but if anyone here has contacts mentioned or omitted but interested, please reach out to them as well.
@davidhalter, forgive me for hijacking your project's Issues for this.
Apologies also for anyone who is offended this discussion would happen on GitHub or in a GitHub issue. To me this seems the most open, publicly visible, conducive space to begin the conversation, though, as I suspect few people are members of the Python code-quality mailing list when compared to the number of people impacted and interested.
The text was updated successfully, but these errors were encountered: