-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Improve importlib backports-upstream integration #129307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would like to propose officially defining a development upstream, and enforcing it. The solution that I think would more cleanly handle fragmentation, history, authorship, and CLA issues, is to select CPython as the upstream. An approach to implement this could be to track the backport version here, and when updated, have CI automation to update the backport repos, just like we do to backport to older Python versions. While I think that's cleaner, it is a major change to how these modules are currently developed, and the implementation might be too complex, so I think it's more likely for us to go with the backports as the upstream. If so, there are a couple things I think we should do:
|
I think that this makes sense, especially as both of the importlib modules are no longer "provisional". Useful parallels can be drawn with PEP 360, which used to record "externally maintained" packages, and was updated in 2006 to say:
Another parallel is the changes to Pathlib that Barney has recently been making. He published the Whilst having a brief look at the history, I found that Jason noted in a comment from a few years ago that:
The other two recently externally-developed modules seem to be Three of the four PRs to
There have also been some problems with synchronising the documentation, as the d.p.o documentation used to point to the backport (python/importlib_metadata#485), with one user going so far as to manipulate Sphinx internals (stefan6419846/license_tools#63) to solve this problem. Ultimately, documentation was removed from the backport package (python/importlib_metadata#466). To Jason's quoted comment above about pace of development eventually slowing, I wonder if at some point we should seek to update the backport packages less frequently, and to mirror Python releases. There is prior art for this with As such, I would be in favour of this A |
Thanks @FFY00 for raising this issue. It's been a lingering concern of mine as well, and I've had a lot of thoughts on the matter. My instinct is the same, that ideally the stdlib should be the canonical implementation and upstream. That's the case for several other backports I maintain (backports.tarfile, singledispatch, configparser, ...). The main reason I haven't taken the packages in this direction is the third-party libraries are more capable and thus drastically easier to develop. I have in fact documented the methodology. Probably I should link that document from the READMEs of the third-party projects for increased visibility. In general, the third-party packages get a much more modern, complete, and sophisticated treatment. It's the presence of these documented advantages that have kept me reluctant to move the upstream to the stdlib. I've been thinking about ways to make the integration (and attribution) better. There are some factors that make the integration more difficult.
In an ideal world, the canonical source for something like "importlib metadata" would exist somewhat independently, be linked into the various target projects, and have customizations overlay and extend the canonical source. I can imagine a couple of ways to model these concerns using VCS tools. A branch per projectImagine having a separate branch in CPython for each project, with its history rooted independent from the CPython history. This branch would have either the raw source or possibly the full third-party package in that branch, but when merged into CPython, would track the new location and CPython-specific requirements. This approach doesn't work in the CPython pull request model due to the squashed merges (the tracking is lost). That's why instead, each of these projects carries their own cpython branch to track those concerns. SubmodulesAnother way to model subsets of an implementation is through Git Submodules. Some companies and projects use submodules as a way to compose larger systems from smaller components. You could imagine the importlib subprojects to each be a submodule attached at This approach is fraught with problems:
Last year, I kicked off work on the essential layout, which aims to solve some of these problems and empower projects to be composable in this way, but it's already had to concede some of the purity of the design (pyproject.toml and .github) and still has some problems yet unsolved (it's incompatible with RTD). Ultimately, I don't feel these options are very attractive, so I'm left limping along with the current methodology. I quite like the suggestions Filipe has brought. They all sound reasonable - let's revisit them in light of the documented methodology. One last thing I wanted to mention - although I dislike it, I sometimes batch several changes from the third-party packages into CPython, mainly because it's a bit of work to get everything synchronized and the amount of toil it would take to re-submit each contribution in multiple places would be impractical. If we had automation to apply mechanically changes to both projects together, that would be ideal. |
The current status-quo when it comes to the development integration/synchronization between the importlib backports and the CPython upstream isn't optimal.
Before anything else, I must properly acknowledge @jaraco's monumental and tireless effort on maintaining the importlib backports, and handling the complex synchronization with the CPython upstream, not to mention the continued development of these modules. It has been instrumental to get things to the state they are today, and none of the issues discussed in this thread should reflect negatively on him, but rather our failure to ensure these projects got the resources they need — a far too common tale in open-source.
Here are some issues I think we should improve:
importlib.metadata.Distribution
, and other classes, do not document their attributes (I regularly have to resort to the source code)cc @python/importlib-team
The text was updated successfully, but these errors were encountered: