-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package source constraints are silently ignored when extras and transitive dependencies are involved #8614
Comments
I've encountered the same issue and I'm actively working on finding a solution. @latk's suggestion of implementing caching within the RepositoryPool could help, but relying solely on execution order may not ensure safety. In my view, a more robust approach to addressing this problem would involve allowing the 'local_config' to be passed to the 'RepositoryPool.' This way, the RepositoryPool can be informed about which packages have a specific source, as explicitly specified in the pyproject.toml. Lines 42 to 46 in 2c3d488
|
It is kind of necessary, but actually I think it is not really necessary (at least not the way we do it now) because if the only difference between
It's not necessary to resolve it but only to cache it immediately after. However, that's still not enough in case the transitive dependency requires a (different) extra so actually we have to make sure that the cache is more robust concerning extras. In #8835 I adapted the dependency cache so that packages of dependencies are cached independent of requested extras. That should be quite robust.
Multiple constraints with different sources seems to be a quite strong argument against the mapping approach in the repository pool. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
-vvv
option) and have included the output below.Issue
We use Poetry to manage multiple private packages, and have recently started migrating to setting our sources to
priority = "explicit"
as encouraged by the Poetry docs. This has been working fine, until Poetry silently stopped installing one of our internal dependencies. I have not been able to pinpoint the issue in Poetry's puzzle/solver, but it seems this bug depends on three factors:priority = "explicit"
(but otherwise leaving the default PyPI source)I don't know of a way to quickly spin up a private package repository so I wasn't able to create a demo/test case, but I will walk us through all relevant details of the scenario:
We have three packages:
private-app
depends onprivate-other
andprivate-lib[extra]
private-other
depends onprivate-lib
(no extras!)private-lib
is a library with extras. This is the library that goes missing during the solving process.The
private-app
pyproject.toml has the following dependency specification:This worked fine while the source was configured as
secondary = true
, but setting it topriority = "explicit"
and adding thesource
constraints on the deps started creating breakage.While trying to debug this without having created a venv with
poetry install
, Poetry instead reported an error that no matching version forprivate-lib
could be found. Unfortunately I can no longer repro because we have already mitigated the problem, and this seems depends on the unavailability of a suitable version in the private package sources? Sorry.When running
poetry update
orpoetry lock
on an existing poetry-installed venv, things would "succeed", but create apoetry.lock
that contained the following entry for the private-lib:Nothing other than the name has been redacted here. There is no
source
, andfiles
are empty. So Poetry knows about the dependency, but finds no installation candidates, and thus silently skips installation. This then broke our CI systems when trying to install dependencies from this lockfile.I have captured a log for this session
poetry -vvv update
, minus all PyPI-based dependencies as they distract from the bug: https://gist.github.com/latk/958e918b2ce4d96956a901263d38e955#file-log-txtHere is an excerpt of that already-condensed log highlighting how the private-lib is resolved:
What we are seeing here is that when the
private-other1
dependencies are resolved, theprivate-lib
is looked up in PyPI rather than through itsexplicit
source. Nothing is found there. Thus, Poetry falls back to looking at locally installed packages. Since the venv already exists, it is found, albeit without any installation candidates (since it's already installed). At the end, oneprivate-lib
package was found each formyrepo
(correct) andPyPI
(incorrect, this is the installed package). When writing the lockfile, it seems that the PyPI/local info is used, thus leading to the emptyfiles = []
array.Regardless of the problematic poetry.lock file being generated, this violates the expectations about top-level dependencies with a
source
option: if I specify asource
, then I'm expecting that metadata about that package will only ever be retrieved from that source. That this logfile showsPyPI
being contacted is arguably a security issue (but not actively exploitable, so whatever…).Attempted workarounds
I have found some techniques to mitigate the observed problems, which might aid with debugging.
Explicitly updating that dependency via
poetry update private-lib
. This leads to a working poetry.lock file, but things will break again with the nextpoetry update
.Setting the private repository to
priority = "supplemental"
. This "works" because when the PyPI lookup forprivate-lib
fails,myrepo
will be checked next, finding the correct package data. This avoids the problematic fallback to the installed version of the package. However, this would only mask the bug, and despite the explicitsource = "myrepo"
would not protect against dependency confusion attacks!Stop using the
extras
in Python packages. This is what we are doing, at least until this Poetry bug is fixed.Include the top-level dependency on the
private-lib
both with and without extras. I believe (see discussion below) that this primes Poetry's dependency solver cache to avoid the PyPI lookup.This doesn't seem to depend on the order of entries.
Interpretation and preliminary analysis
I've been poking a bit at the Poetry source code and now have a basic understanding of Poetry's dependency solver. I think the following is happening, but am not entirely sure:
When a dependency has extras, a virtual dependency is created which inherits the correct source:
poetry/src/poetry/puzzle/provider.py
Lines 532 to 536 in a029b36
Thus, we actually have the following facts in our dependency graph:
private-app
depends onprivate-other
withsource = "myrepo"
private-app
depends onprivate-lib[extra]
withsource = "myrepo"
private-other
depends onprivate-lib
private-lib[extra]
depends onprivate-lib
withsource = "myrepo"
The 3rd fact cannot have a
source
, because sources are purely a Poetry feature, and not part of Python package metadata.Instead, the source for a transitive dependency can be set by specifying it as a top-level dependency in the
pyproject.toml
. Later, dependency resolutions are resolved by name in the dependency cache. I think that happens here:poetry/src/poetry/mixology/version_solver.py
Lines 486 to 488 in a029b36
poetry/src/poetry/mixology/version_solver.py
Lines 96 to 114 in a029b36
Since the
complete_name
is used as the key,private-lib[extra]
has a different cache entry fromprivate-lib
, but that is of course necessary.What is not obvious is that the contents of the cache are sensitive to the order in which dependencies are resolved:
private-lib
gets wrongly associated with the PyPI source (and this is inherited while resolving fact 4).private-lib
andprivate-lib[extra]
in the Poetry dependencies.I'm not sure how this could be solved. Maybe:
when a virtual
dep[extras] -> dep
dependency is created, ensure thatdep
is resolved immediately to prime the cache with the correct source. However, this seems fragile, and wouldn't be a systematic solution to this kind of problem.keep an explicit name -> source mapping, which is used to look up the appropriate source for transitive dependencies. However, this would fail for configurations with constraints like:
cache package data at the correct place. Currently, information about dependencies is cached in two places during resolution. First, the aforementioned
DependencyCache
. This will then eventually load data from theRepositoryPool
, which will load data from the correct repository (in particular, the explicitmyrepo
if the source name is set to such):poetry/src/poetry/repositories/repository_pool.py
Lines 209 to 219 in a029b36
Each repository in the pool might have its own cache (though this doesn't seem to prevent the repeated failing PyPI lookups visible in the logfile…).
It seems to me that this problem might disappear if there were a cache around the RepositoryPool? After all, the repositories do not care about
extras
, so a top-level dependency onprivate-lib[extra]
withsource = "myrepo"
would prime a repository pool cache for any subsequentprivate-lib
lookup.The text was updated successfully, but these errors were encountered: