Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor repositories pool #6669

Merged
merged 6 commits into from
Oct 10, 2022
Merged

Refactor repositories pool #6669

merged 6 commits into from
Oct 10, 2022

Conversation

b-kamphorst
Copy link
Contributor

@b-kamphorst b-kamphorst commented Oct 1, 2022

Pull Request Check List

Relates-to: #3155

  • Added tests for changed code.
  • Updated documentation for changed code.

I refactored Pool in preparation of tackling #6713. Most notably, this PR uses an enum to significantly reduce the bookkeeping effort for the various types of repositories (primary / default / secondary).

Two considerations:

  • One may argue whether the RepositoryPriority (name open for suggestions) should be an attribute of a Repository instead. I'm willing to have a look at that, but as that change will scatter all over the (test) code I decided to present this suggestion first.
  • OrderedDict is used just to satisfy the test test_repository_ordering in tests/repositories/test_pool.py. Although I appreciate the deterministic approach, I would like to verify whether this ordering is actually relevant. In particular, was the order only verified to ensure proper ordering + clustering between default/primary/secondary, or also proper ordering within each cluster? Phrased differently, does the user expect that the order in which primary sources A and B are added through the CLI affects the resolution order? They might and I'm happy to leave it as is, but otherwise we can use normal dicts.

@b-kamphorst
Copy link
Contributor Author

I also removed the inheritance from Repository without removing any functionality. The only thing that Pool actually used was the name property. As it appears that this is not used anywhere, I made Pool nameless.

@radoering
Copy link
Member

Phrased differently, does the user expect that the order in which primary sources A and B are added through the CLI affects the resolution order?

I expect that the order of primary sources in pyproject.toml is relevant. I didn't find anything in the docs, but I think we should keep it deterministic.

@b-kamphorst
Copy link
Contributor Author

Hi @radoering, thank you for your swift consideration.

Phrased differently, does the user expect that the order in which primary sources A and B are added through the CLI affects the resolution order?

I expect that the order of primary sources in pyproject.toml is relevant. I didn't find anything in the docs, but I think we should keep it deterministic.

That is very reasonable. I've clarified the resolution order in the docs. Please have a look at the proposed change. I'm also looking forward to your other thoughts on the PR.

@radoering
Copy link
Member

I just ran test_pool.py with coverage. It seems there is no coverage for has_primary_repositories, find_packages and search and package is only tested superficially. It would be nice to add some tests to gain more confidence in the refactoring.

I still need to take a closer look, but at first glance it looks like a neat simplification. 👍

However, I noticed that without _lookup and the Boolean flags there is more iterating in some methods/properties. I don't know how often this code is called during dependency resolution and if this could be a performance bottleneck.

src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
@b-kamphorst
Copy link
Contributor Author

b-kamphorst commented Oct 2, 2022

I just ran test_pool.py with coverage. It seems there is no coverage for has_primary_repositories, find_packages and search and package is only tested superficially. It would be nice to add some tests to gain more confidence in the refactoring.

We now have 100% test coverage for test_repository_pool.py 😃 I noted that LegacyRepository are skipped in Pool.search. I understand the rationale, but is it also expected / documented for users (just checking, I don't know)?

However, I noticed that without _lookup and the Boolean flags there is more iterating in some methods/properties. I don't know how often this code is called during dependency resolution and if this could be a performance bottleneck.

I assumed that we wouldn't do this too often, but it was quite easy to make this constant-time lookup (at the slight expense of clarity).

@b-kamphorst
Copy link
Contributor Author

b-kamphorst commented Oct 2, 2022

Apparently all tests failed due to an external import by poetry_plugin_export. I temporarily fixed it in ea72f31 and 9aec915, but I can imagine you want to revert the whole renaming of repositories/http.py and repositories/pool.py.

@b-kamphorst
Copy link
Contributor Author

Apparently all tests failed due to an external import by poetry_plugin_export. I temporarily fixed it in ea72f31 and 9aec915, but I can imagine you want to revert the whole renaming of repositories/http.py and repositories/pool.py.

This is clearly breaking, I'll revert the changed names tomorrow.

@neersighted
Copy link
Member

@b-kamphorst I'm okay with the rename personally -- the plugin is coupled to implementation details and we can work around that with a 3-PR-dance or just some maintainer magic (but we should break the rename out into a separate PR).

Regarding search, it's because only PyPI supports search as of now via the deprecated XML-RPC API, and not via PEP 508. Any vestiges of search on non-PyPI repositories is a holdover from when PyPI was not special-cased to use the JSON API: https://warehouse.pypa.io/api-reference/xml-rpc.html

@radoering
Copy link
Member

I'm strongly for breaking all the renaming out in a separate PR with renaming only and no other changes.

We now have 100% test coverage for test_repository_pool.py 😃

Sounds great. 🎉

I assumed that we wouldn't do this too often, but it was quite easy to make this constant-time lookup (at the slight expense of clarity).

The only way to be sure is doing some measurements. (On the one hand I don't assume it is called too often or has a significant influence, but on the other hand I can imagine that some of the methods are called for each package lookup.) If others don't mind we can also take the risk.

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot more promising overall -- I haven't had time to look at everything exhaustively, but I left some feedback.

On the rename, I suggest we either to that in a separate PR, or you raise some DeprecationWarnings when the old names are imported.

I'd also like to see the kwargs change -- @dimbleby described it accurately, and it would remove the need for type: ignore[override] while better encapsulating how these classes are used.

docs/repositories.md Outdated Show resolved Hide resolved
src/poetry/console/commands/show.py Outdated Show resolved Hide resolved
src/poetry/repositories/repository_pool.py Outdated Show resolved Hide resolved
@radoering
Copy link
Member

On the rename, I suggest we either to that in a separate PR, or you raise some DeprecationWarnings when the old names are imported.

When renaming you have to cut the commits carefully in order to avoid "breaking" git history. We shouldn't just squash the commits. See python-poetry/poetry-core#482 for example. That's why I suggest doing it in a separate PR.

@b-kamphorst
Copy link
Contributor Author

It appears that master (44a89cb) breaks all tests?

@neersighted
Copy link
Member

It appears that master (44a89cb) breaks all tests?

You've got a typing issue.

@b-kamphorst
Copy link
Contributor Author

b-kamphorst commented Oct 3, 2022

It appears that master (44a89cb) breaks all tests?

You've got a typing issue.

Fixed that. Locally the tests also failed in some places but apparently that is resolved :-)

I prepared another PR with the name changes. I'll present it when this PR is closed.

Thanks for your time, I enjoy the collaboration!

@b-kamphorst
Copy link
Contributor Author

Hi @radoering, @neersighted, are there any other changes that need to be processed here (at some point @neersighted mentioned that the RepositoryPriorities needs to change, but that discussion was dropped for this PR) or can the PR be merged?

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks very promising. I have some concerns about the specific implementation, but otherwise the new tests look good and the architecture seems sound. Good work cleaning this up!

Once review is addressed, I'd like to see cleaner history -- it would be nice to introduce the new tests and then the refactor (assuming the tests pass without the refactor -- I didn't look closely to see if they would run against the old implementation).

src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
@b-kamphorst
Copy link
Contributor Author

Once review is addressed, I'd like to see cleaner history -- it would be nice to introduce the new tests and then the refactor (assuming the tests pass without the refactor -- I didn't look closely to see if they would run against the old implementation).

This took me a lot of time, but at least I learned a bit more about git (plenty to learn left). The added tests are now in the first commit (not 100% at that commit just yet).

@neersighted
Copy link
Member

I personally would drop the RepositoryName commits from this PR and move them to a subsequent PR -- I think that we might want to think a little more about the design/implementation and it clutters this one too much.

@neersighted
Copy link
Member

neersighted commented Oct 10, 2022

@b-kamphorst I've pushed up a rebase of your branch that squashes the changes down into mergable commits -- I think this is good to go once we address the final review comments, as long as we keep the history clean.

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final batch of requested changes -- should be good to go after we discuss/address these.

src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
src/poetry/repositories/pool.py Outdated Show resolved Hide resolved
@b-kamphorst
Copy link
Contributor Author

@b-kamphorst I've pushed up a rebase of your branch that squashes the changes down into mergable commits -- I think this is good to go once we address the final review comments, as long as we keep the history clean.

I have processed the suggested changes (in a separate commit for your convenience). If they are in line with your expectations, feel free to squash the final commit into another commit and make this fly :)

b-kamphorst and others added 4 commits October 10, 2022 03:16
Introduces Priority enum to help in bookkeeping of repository types.
Additionally specifies parameter 'repository' to 'repository_name' where
relevant.
Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. FWIW, I would suggest getting familiar with interactive rebases and how to leverage the edit rebase command (as well as the git reset CLI command) to manipulate the history of branches. When you're doing larger, more ambitious PRs like this, you generally want to take the approach you took and make the history as atomic/easy to follow for reviewers as possible.

Then, when it's time for merge, you'll want to squash your commits down. In the simple/smaller PR case a squash of everything is just fine. But in a larger PR like this it's helpful to capture some of the evolutionary steps as commits in history -- which is when you want to master that process of munging history during an interactive rebase.

@neersighted neersighted enabled auto-merge (rebase) October 10, 2022 09:19
@neersighted neersighted added kind/refactor Pulls that refactor, or clean-up code area/sources Releated to package sources/indexes/repositories labels Oct 10, 2022
@neersighted neersighted added this to the 1.3 milestone Oct 10, 2022
@neersighted neersighted merged commit 94a5ce4 into python-poetry:master Oct 10, 2022
@b-kamphorst b-kamphorst deleted the refactor-repositories-pool branch October 11, 2022 10:34
@b-kamphorst b-kamphorst mentioned this pull request Oct 14, 2022
2 tasks
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/sources Releated to package sources/indexes/repositories kind/refactor Pulls that refactor, or clean-up code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants