-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't search secondary repositories if not required #5984
Don't search secondary repositories if not required #5984
Conversation
If I'm not wrong it is controversial. What if two sources provide different versions of the same package? That's a common use case. |
Per the new testcase: if the primary can meet the requirement, then only versions from the primary are considered. What does "secondary" even mean, if packages from secondary sources are considered although a primary source has met a requirement already? The implication of this MR would be that if they care about getting a particular version of a package, folk should say so in their |
To expand on the trade that is being made here. This MR can introduce failures to find a solution in a case like this:
The implicit claim of this MR is that this is an unusual case, which anyway can easily be solved eg by declaring that the requirement on Most of the time when people are using secondary sources, surely it's because they want to get most of their packages from one place (probably PyPI), but they have a small number of private or otherwise special packages in some secondary place. This MR prioritises that use case. |
While I don't have data to back that, I would tend to agree with @dimbleby, as I believe that the main usage of If people still want the custom index to be searched for for all dependencies, as you report @radoering, wouldn't not setting Personally, before needing to use a custom index as Even if there is a need to keep the existing behavior for some users, I still think that the changes in this PR are a more sensible default, since this is what, IMO, most people would expect. |
I agree about the main use case. However, another common use case is to have patched/modified versions of public available packages in your private repository. Especially, if these patches/modifications are not required for the latest version but only for some older ones and these are transient dependencies, this change might make things more complicated. Probably, this use case can still be covered by being more explicit.
I don't think so. That's exactly what is changed here. Secondary sources are not searched anymore if a dependency is found in the primary source. With the change you can't achieve searching all sources anymore. You have to be more explicit (which might be good or not) and pin a depdendency to a specific source. That said I'm not completely against this change. I just have concerns it's a breaking change. If we decide to do this, we should at least consider having a setting that restores the old behavior. |
What about an There are some private packages that I only ever want installed from a private repository, never from PyPI - everything else can come from PyPI. After all, someone could register some package with the same name as one of my private packages in PyPI and a version of Semi-related, I had an issue recently where I have three packages referencing a private repository via a secondary source, everything else doesn't specify the source and I'd expect them to use the default, implicit |
I would like to add that such a feature is important, specifically as a PyTorch user. For a quick demo, I have some packages that need to be retrieved from pypi and some others ( [tool.poetry]
name = "test-package"
version = "0.1.0"
description = ""
authors = ["test@gmail.com"]
readme = "README.md"
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
numpy = { version = "~1.23", source = "pypi" }
rich = { version = "12.5.1", source = "pypi" }
torchvision = { version = "~0.13.0+cu116", source = "pytorch"}
[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu116/"
secondary = true
[build-system]
requires = ["poetry-core>=1.1.0.b3"]
build-backend = "poetry.core.masonry.api" It results with the following output: For torchvision dependencies, it will also try to use the secondary source. Having the proposed Poetry version : 1.2.0.b3 |
Indeed, this is very important to us. As mentioned, on our side we want to use this to use private package and as mentioned, perf is really bad with secondaries (because the way the resolution is done with different APIs than the one's of pypi. As long as a secondary is present somewhere, the performance drops completely We patched poetry to have this behavior in: #5472 Here are our results: Poetry 1.1.13 patched with #5472poetry update -> ~30s Poetry 1.1.14 (not patched)poetry update -> ~1 min 30s Poetry 1.2.0b3 (not patched)poetry update -> ~1 min |
This sounds like a reasonable proposal. Regardless of what we decide, it is important to keep backward compatibility, so we could either:
Both proposals make sense, as long as in the end, users have an option to either opt in for this new feature, or can go back to the current behavior, especially since right now, if users have no other choice than to rely on a secondary source, and this source is slow network-wise, this could drastically slow down dependencies resolution. Personally, letting breaking changes concerns aside, I still think that not searching in secondary sources is a more sensible default, because I think that most of the time, people use secondary sources for specific packages (whether they are private, or because some libraries are not published to PyPI), and may even assume that this is actually what Poetry does. |
Just noting that I'm not personally motivated to drive this one further - I never use secondary repositories myself, so taking this from a quick-n-dirty MR to a redesigned set of configuration options doesn't much interest me. I think this has been useful in kicking the discussion and establishing that there is interest: but it will likely now need someone who cares more about this to step up and do some work if it is to go further. |
A related problem with this, if someone doesn't have access to the secondary source it doesn't work at all, because it always needs to try all the sources. In this example I didn't turn on the VPN so I didn't have access to the secondary source, but I didn't needed it. And when I tried to force apache-airflow to pipy, with pyproject.toml: [tool.poetry.dependencies]
python = "~3.7.13"
apache-airflow = { version = "2.2.5", platform = "linux" }
[[tool.poetry.source]]
name = "external_source"
url = "https://nexus.___.io/repository/___-pypi/simple"
secondary = true poetry lock: Updating dependencies
Resolving dependencies... (0.7s)
RepositoryError
403 Client Error: Forbidden for url: https://nexus.___.io/repository/___-pypi/simple/apache-airflow/
at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/legacy_repository.py:393 in _get
389│ if response.status_code == 404:
390│ return
391│ response.raise_for_status()
392│ except requests.HTTPError as e:
→ 393│ raise RepositoryError(e)
394│
395│ if response.status_code in (401, 403):
396│ self._log(
397│ "Authorization error accessing {url}".format(url=response.url), |
4db3ca1
to
ac80f1d
Compare
I think, to preserve backwards compatibility, the long-term path forward is probably two different options (in my example,
|
What is the difference between that new |
|
@neersighted great proposal, that would be a huge progress! Is there a PR to watch/contribute to for these changes? |
@neersighted @graipher this is a very good proposal and would save me lots of headache. What is the progress on this? |
It is a fairly hairy refactor, in order to introduce these new behaviors while permitting backwards compatibility. It likely wouldn't be looked at until 1.4, and no one has expressed interest in implementing it yet. |
@neersighted I understand. Thanks for the update! |
Closing in favor of #6713 |
I've been doing some preparatory work to implement this. I'm now at the point where I write tests that express the desired behavior and found out that the Currently, such dependencies are not bound to the source, meaning that they may be retrieved from any configured repo including the source repo. Perhaps the most natural option is to, when we introduce The first suggestion seems most appropriate. My private packages have private dependencies, so I'd expect that to work just fine. Still, I'd like to hear your thoughts before I make assumptions. |
@b-kamphorst would you mind discussing this on #6713 instead? I'd like to centralize the discussion there. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Not sure whether this is controversial or just obviously sensible...
Always searching secondary sources is bad for performance: and generally just isn't what people mostly expect.
Here's a behaviour change that skips secondary sources altogether, for packages that were provided by primary sources.
It's possible that this was the intended behaviour all along - the testcase
test_solver_does_not_choose_from_secondary_repository_by_default
suggests as much. I'm not sure where the code that makes that test pass is! But anyway this is surely a more efficient way of achieving that goal.this would fix eg #5959, #5096