Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pinning a package to a specific index #171

Closed
charliermarsh opened this issue Oct 23, 2023 · 46 comments · Fixed by #7481
Closed

Add support for pinning a package to a specific index #171

charliermarsh opened this issue Oct 23, 2023 · 46 comments · Fixed by #7481
Labels
enhancement New feature or improvement to existing functionality projects Related to project management capabilities

Comments

@charliermarsh
Copy link
Member

Discussed this with Armin -- pip doesn't support it, and it seems like a big problem? If you have an internal index, but also want to get some packages from PyPI, there's no way to ensure that your internal packages come from your internal index. Packages on PyPI could even shadow them.

@charliermarsh charliermarsh added this to the Initial release milestone Oct 24, 2023
@charliermarsh charliermarsh added enhancement New feature or improvement to existing functionality wish Not on the immediate roadmap labels Oct 24, 2023
@charliermarsh
Copy link
Member Author

pypa/pip#8606

@charliermarsh
Copy link
Member Author

@charliermarsh
Copy link
Member Author

I don't quite understand why this is so hard (as per the pip issue), I can't tell if it's hard because it's a large conceptual change for pip specifically, if it's hard because it's pip is large and complicated and any changes are hard, or if there's inherent complexity.

@charliermarsh
Copy link
Member Author

Poetry's design is interesting: https://python-poetry.org/docs/repositories/. It feels a bit more complex than is necessary though.

@groodt
Copy link

groodt commented Feb 19, 2024

Please consider dependency confusion attacks: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

Use of --extra-index-url as they are presently used are a security vulnerability.

PEP 708 is a yet-to-be-implemented approach to improving the security posture.

@charliermarsh
Copy link
Member Author

I would love to implement improvements to this (like the ability to pin a dependency to a specific index)… We specifically held off and implemented indexes as-is to be spec-compliant. I’ll implement this as soon as it’s supported and there’s clarity on how installers should handle it!

@groodt
Copy link

groodt commented Feb 19, 2024

Makes sense.

I think you may receive a lot of duplicate feature requests from the folks who do misuse --extra-index-url and who aren't aware that it is not currently intended to be used to append additional sources of dependencies, it's purpose is to provide a set of fallback mirrors of the primary index (--index-url) which is PyPI in the general case.

We may need to consider offering some help as mentioned in the PEP to move this along.

This PEP has been provisionally accepted, with the following required conditions before the PEP is made Final:

An implementation of the PEP in PyPI (Warehouse) including any necessary UI elements to allow project owners to set the tracking data.
An implementation of the PEP in at least one repository other than PyPI, as you can’t really test merging indexes without at least two indexes.
An implementation of the PEP in pip, which supports the intended semantics and can be used to demonstrate that the expected security benefits are achieved. This implementation will need to be “off by default” initially, which means that users will have to opt in to testing it. Ideally, we should collect explicit positive reports from users (both project owners and project users) who have successfully tried out the new feature, rather than just assuming that “no news is good news”.

In the short-term, if you don't want to bug-for-bug implement pip, we may need to point people at alternatives like https://github.com/uranusjr/simpleindex to help them merge indexes behind the scenes on localhost. I don't think many will like it 😂

@pawamoy
Copy link

pawamoy commented Feb 19, 2024

@groodt can you point at official docs that confirm this use of extra index is indeed misuse? To me it fits well with local versions for example?

In any case, thanks for the links to simple index, this will be useful if case you're right and UV does not allow this use of extra index :) A bit sad to have to run two local servers instead of one, but that's not that bad.

@pfmoore
Copy link
Contributor

pfmoore commented Feb 19, 2024

I don't quite understand why this is so hard

From what I recall (I quickly refreshed my memory on the issue but didn’t review the details) it’s a “design decision that was made a long time ago” type of complexity. Pip has built a bunch of choices on top of the “all indexes are equal and get merged” model, and it’s hard to know what we’d need to change if we were to revisit that decision. Add to that the need for backward compatibility and it’s a much bigger question than people saying “just pick the index I ask for” will accept.

For uv, I foresee the following issues:

  1. Compatibility - you’ll diverge from pip in both subtle and not-so-subtle ways. This isn’t a bad thing, but your stated intention is to be compatible with pip, so you need to make choices here.
  2. Being an outlier - the “all indexes are equal” model is not just something pip does. Standards like PEP 708 are built on the idea, so you may have trouble fitting that standard into your implementation. More generally, you’ll have to make a bunch of UI and design issues where there is less pre-existing experience to draw on.
  3. Scope creep - people start wanting things like index priorities (“get these from piwheels when you can, but fall back to PyPI if there’s nothing at all published on piwheels”) which have their own issues.
  4. You open yourself up to the idea that a package is no longer identified by just name and version - where it came from becomes important as well. This may have wider implications - for example, SBOM data might need extra information like source URL.

On the plus side, you’ll be offering a solution to something users have been wanting for a long time, and which is often characterised as a security issue. And practicality may well beat purity here.

can you point at official docs that confirm this use of extra index is indeed misuse

It doesn’t describe it as “misuse”, but the pip docs are clear that we treat all indexes as equal: “There is no ordering in the locations that are searched” from here. We (pip) could add a guide-type article discussing this in more depth, but we don’t have one currently.

@groodt
Copy link

groodt commented Feb 19, 2024

There’s a small security warning in the pip docs here

Using this option to search for packages which are not in the main repository (such as private packages) is unsafe, per a security vulnerability called [dependency confusion](https://azure.microsoft.com/en-us/resources/3-ways-to-mitigate-risk-using-private-package-feeds/): an attacker can claim the package on the public repository in a way that will ensure it gets chosen over the private package.

There is also an in progress pip PR to make this more explicit here pypa/pip#11694

Here’s a major recent dependency confusion attack that impacted PyTorch (caused by instructions to use —extra-index-url) https://news.ycombinator.com/item?id=34202662

@pawamoy
Copy link

pawamoy commented Feb 19, 2024

@groodt I'm not sure this is to answer my previous question ("can you point at official docs that confirm this use of extra index is indeed misuse"), but I was especially referring to "it's purpose is to provide a set of fallback mirrors of the primary index".

I distinguish three use-cases for extra indexes:

  1. extend main index with additional projects (subject to dependency confusion)
  2. extend specific projects in main index with additional versions (not subject to dependency confusion)
  3. mirror PyPI.org

My understanding of your "append additional sources of dependencies" was that it referred to case 2, but now I think you were speaking of case 1.

So, to rephrase, and given pip currently considers all indexes to be equal, is case 2 is a misuse too?

(Anyway, after reading PEP 708, I also agree it's the way forward and not index ordering, as I commented here 👍 )

@pfmoore
Copy link
Contributor

pfmoore commented Feb 19, 2024

Stepping back from the question of "why is this so hard", @groodt is correct that PEP 708 is the better solution here. Otherwise, does the user need to specify an explicit index for the whole of their internal package's dependency tree? What if someone adds a new internal module, and forgets to add it to the list of "must come from the internal index" list in all of their install jobs? Are we going to consider that as "user error"? The torchtriton attack took exactly this form. Or an attacker could compromise a public dependency of your internal project.

Having an index pin option doesn't prevent you from needing to handle the consequences of a "all distributions with the same name and version are interchangeable" model. It just gives users a manual way of firefighting issues with that model.

@pfmoore
Copy link
Contributor

pfmoore commented Feb 19, 2024

So, to rephrase, and given pip currently considers all indexes to be equal, is case 2 is a misuse too?

It's not that "pip considers all indexes to be equal" but rather that "pip considers all distributions with the same name and version to be interchangeable regardless of which index they came from". The difference is subtle but important. Whether case 2 is a problem depends on whether you trust the "main index", just the same as with case 1. The trust issue is what's important here rather than what is "considered equal"1.

The --extra-index-url argument for pip was added long ago, in simpler times when people weren't anywhere near as worried about attackers targetting PyPI, and nobody was running multinational businesses based on Python code. Trust was generally assumed to be present, and in particular no-one was worrying about relative trust between indexes. Things have changed, for better or worse, and --extra-index-url doesn't match the new reality. This is why one of the options pip is considering is simply removing it, and demanding that users pick a single index and ensure that they trust that one index. That's unlikely to ever happen because the breakage would be significant, but it's absolutely an option that a new project like uv could (and should) consider.

To answer your question more explicitly, your case (2) is a "misuse" in the sense that it has risks (the same as case 1). The risks may not be a dependency confusion attack, but they do include compromise of the PyPI account owning of the code you're extending. Is that a more acceptable risk? Only you can decide that.

The point is that mixing indexes with different trust levels2 is the problem here. Even the "mirror PyPI" case involves a risk if the mirror is compromised.

Footnotes

  1. You could argue that if you don't trust two indexes equally, you don't "consider them equal", I guess...

  2. Within the same install command, which is why pinning for a set of named packages isn't a sufficient solution.

@pawamoy
Copy link

pawamoy commented Feb 19, 2024

Thanks @pfmoore!

(Side note: I just discovered that PDM supports respecting the order of indexes: https://pdm-project.org/latest/usage/config/#respect-the-order-of-the-sources.)

@pfmoore
Copy link
Contributor

pfmoore commented Feb 19, 2024

Just as an example (and yes, it's a pathological case) if you have ordering, suppose you have index1 and index2 (ordered with 1 having priority over 2). Index 1 contains A 1.0 and B 1.0, with A 1.0 depending on B. Index 2 has B 2.0. If you install A, do you get B 1.0 or B 2.0? If the answer is 1.0, why did you bother specifying index 2? If you get B 2.0, and someone now adds A 2.0 to index 2, and changes B 2.0 to depend on A > 1.0, is the correct thing to upgrade A to 2.0, or downgrade B to 1.0?

The point here is that there's multiple possible choices, and if you don't factor index priority into the core of your resolution algorithm, you end up with a system that users won't have a good intuition about, and which might depend on implementation details. Both of which can lead to security issues. I'm not saying that PDM has such a problem - they may well have considered all of this. Just that "which index did this come from" is an extra axis you have to consider as part of resolution, not just something you can keep separate from the resolver.

Anyway, the key point for uv, and in answer to the question @charliermarsh asked above, is that this is why pip is still trying to work out a good answer to how we could allow users to choose which index to use at a more fine grained level than "per install".

@pawamoy
Copy link

pawamoy commented Feb 19, 2024

If the answer is 1.0, why did you bother specifying index 2?

To be able to find package C, D, E, etc. 😄 In my case, all packages from index 1 take precedence, even if higher versions are available in index 2. Index 1 contains just a tiny few packages, index 2 contains all the rest (PyPI.org in my case). Well, pypiserver can redirect to a fall back index when a package isn't found, so a single index (the local pypiserver one) is enough for me, but if it didn't have this feature, I'd have to continue relying on index-url + extra-index-url, with the limitation that I cannot enforce packages to be fetched from one of the two.

PEP 708 will bring the same ability with finer-grain control (per-project fallback). I'll see if pypiserver maintainers are interested in supporting it 🙂

Could also be interesting to know how PDM handles ordering. If @frostming wants to chime in 😄

@pfmoore
Copy link
Contributor

pfmoore commented Feb 19, 2024

In my case, all packages from index 1 take precedence, even if higher versions are available in index 2.

Cool, so your approach to priority is like pinning, but with the decision on whether to pin being "if package A is in index 1, then pin to index 1, else fall through". That works, but it doesn't support the piwheels case where they supplement PyPI with wheels for the raspberry pi architecture, letting installers fall back to PyPI if the wheel isn't valid for the user's architecture (at least that's how I understand what they do...)

Getting into this much detail may be more than the uv maintainers want, though. Let's see if this is useful to them. I don't want to come across as saying that this is too hard to do - it's too hard for pip to easily do (which is what triggered my original comment) but uv may well have different priorities and choose different trade-offs. They also don't have backward compatibility to deal with (unless they want to match pip feature for feature).

@BurntSushi
Copy link
Member

@pfmoore Thank you very much for your detailed comments here. They've been super helpful.

I'd like to make a small tweak to how uv behaves today that will hopefully resolve at least some of the issues users are hitting (#1377, #1600, #1451) in practice, but do it in a way that doesn't make uv behave like pip (which, AIUI, is to consider all available packages from all indexes, without any guaranteed priority order).

So today, our implementation works by giving a preference order to the indexes made available to uv:

for index in self.index_urls.indexes() {
let result = self.simple_single_index(package_name, index).await?;
return match result {
Ok(metadata) => Ok((index.clone(), metadata)),
Err(CachedClientError::Client(err)) => match err.into_kind() {
ErrorKind::Offline(_) => continue,
ErrorKind::RequestError(err) => {
if err.status() == Some(StatusCode::NOT_FOUND)
|| err.status() == Some(StatusCode::FORBIDDEN)
{
continue;
}
Err(ErrorKind::RequestError(err).into())
}
other => Err(other.into()),
},
Err(CachedClientError::Callback(err)) => Err(err),
};
}

That is, given uv pip install --index-url foo --extra-index-url bar --extra-index-url quux package, it will first try foo, then bar and then quux. Once a package is found, uv will stop looking for it in any other indexes. So for example, if package is found in bar, then that means it definitely wasn't found in foo, and it may or may not be in quux. We never check.

Since --index-url defaults to PyPI, that means something like uv pip install --extra-index-url bar package will check PyPI first for package, and if it's found, stop. I think this turns out to be the reverse of what the commonly desired behavior is. And while it won't address every use case, I think a nice stop-gap solution here would be to flip the preference order with respect to --index-url and --extra-index-url. That is, we check PyPI after all other extra index URLs given.

This will not match pip's behavior in every case, but I think it does help address some of the common cases and I think also helps to mitigate the dependency confusion concerns. That is, if package is in bar, then uv would completely ignore any packages named bar on PyPI. (Today, that's flipped. If package is on PyPI, then it will completely ignore any package on bar.)

Otherwise, I do agree that if we can get away with it, we should probably avoid encoding pip's behavior of including packages from all indexes without regard to priority into uv. But we can certainly revisit this if my tweak above doesn't pan out.

And popping up a level, I do think we'll want to absolutely address the multi-registry issue by giving users more control when we build out more project management features. But I think until then, we'll probably want to avoid adding too many additional abstractions into uv pip install for dealing with multiple registries. And of course, I suspect we will want PEP 708 support eventually too.

@pawamoy
Copy link

pawamoy commented Feb 29, 2024

Nice! Worth noting is users who want the flipped behavior can do uv pip install --index-url https://private-index.com/simple --extra-index-url https://pypi.org/simple, right?

@BurntSushi
Copy link
Member

Nice! Worth noting is users who want the flipped behavior can do uv pip install --index-url https://private-index.com/simple --extra-index-url https://pypi.org/simple, right?

Ah yes! I meant to call that out, but yes indeed.

BurntSushi added a commit that referenced this issue Feb 29, 2024
Previously, we would prioritize `--index-url` over all
`--extra-index-url` values. But now, we prioritize all
`--extra-index-url` values over `--index-url`. That is, `--index-url`
has gone from the "primary" index to the "fallback" index. In most
setups, `--index-url` is left as its default value, which is PyPI.

The ordering of `--extra-index-url` with respect to one another remains
the same. That is, in `--extra-index-url foo --extra-index-url bar`,
`foo` will be tried before `bar`.

Finally, note that this specifically does not match `pip`'s behavior.
`pip` will attempt to look at versions of a package from all indexes in
which in occurs. `uv` will stop looking for versions of a package once
it finds it in an index. That is, for any given package, `uv` will only
utilize versions of it from a single index.

Ref #171, Fixes #1377, Fixes #1451, Fixes #1600
BurntSushi added a commit that referenced this issue Feb 29, 2024
Previously, we would prioritize `--index-url` over all
`--extra-index-url` values. But now, we prioritize all
`--extra-index-url` values over `--index-url`. That is, `--index-url`
has gone from the "primary" index to the "fallback" index. In most
setups, `--index-url` is left as its default value, which is PyPI.

The ordering of `--extra-index-url` with respect to one another remains
the same. That is, in `--extra-index-url foo --extra-index-url bar`,
`foo` will be tried before `bar`.

Finally, note that this specifically does not match `pip`'s behavior.
`pip` will attempt to look at versions of a package from all indexes in
which in occurs. `uv` will stop looking for versions of a package once
it finds it in an index. That is, for any given package, `uv` will only
utilize versions of it from a single index.

Ref #171, Fixes #1377, Fixes #1451, Fixes #1600
BurntSushi added a commit that referenced this issue Feb 29, 2024
Previously, we would prioritize `--index-url` over all
`--extra-index-url` values. But now, we prioritize all
`--extra-index-url` values over `--index-url`. That is, `--index-url`
has gone from the "primary" index to the "fallback" index. In most
setups, `--index-url` is left as its default value, which is PyPI.

The ordering of `--extra-index-url` with respect to one another remains
the same. That is, in `--extra-index-url foo --extra-index-url bar`,
`foo` will be tried before `bar`.

Finally, note that this specifically does not match `pip`'s behavior.
`pip` will attempt to look at versions of a package from all indexes in
which in occurs. `uv` will stop looking for versions of a package once
it finds it in an index. That is, for any given package, `uv` will only
utilize versions of it from a single index.

Ref #171, Fixes #1377, Fixes #1451, Fixes #1600
@zanieb
Copy link
Member

zanieb commented Aug 23, 2024

Removing with wish label here, as we expect to implement this in tool.uv.sources

@colinjc
Copy link

colinjc commented Sep 5, 2024

Current uv docs for dependencies make it sound like this feature is already available. Let to a bit of churn trying to figure out how to use it 😅

https://docs.astral.sh/uv/concepts/dependencies/

tool.uv.sources enriches the dependency metadata with additional sources, incorporated during development. A dependency source can be a Git repository, a URL, a local path, or an alternative registry.

@msabramo
Copy link

msabramo commented Sep 13, 2024

Does uv not expand environment variables in the extra-index-url setting in pyproject.toml? I couldn't get this to work:

[tool.uv]
# uv doesn't expand ARTIFACTORY_USER and ARTIFACTORY_API_TOKEN?
extra-index-url = ["https://${ARTIFACTORY_USER}:${ARTIFACTORY_API_TOKEN}@pythonpackages.corp.example.com/artifactory/api/pypi/pypi-colorado-tools-snapshot/simple"]

It does work if I set UV_EXTRA_INDEX_URL in the shell from which I invoke uv, presumably because the shell is expanding the environment variables.

And as a point of reference, we are currently using pdm and this works:

[[tool.pdm.source]]
name = "colorado-tools-snapshot"
url = "https://${ARTIFACTORY_USER}:${ARTIFACTORY_API_TOKEN}@pythonpackages.corp.example.com/artifactory/api/pypi/pypi-colorado-tools-snapshot/simple"
include_packages = ["venice", "venice-*"]

The packages in include_packages are proprietary packages, so they only exist in this internal Artifactory and not on PyPI. There's no mirroring and no choice of where to get them from; they have to come from a specific Artifactory package index.

Great work on uv!

@vlad-ivanov-name
Copy link

vlad-ivanov-name commented Sep 13, 2024

this is probably something that could be solved via keyring and --keyring-provider=subprocess (and is also unrelated to this issue)

@zanieb
Copy link
Member

zanieb commented Sep 14, 2024

Please comment on #5734 instead for environment variable expansion.

@charliermarsh
Copy link
Member Author

This is starting to come together in #7481.

@samypr100
Copy link
Collaborator

samypr100 commented Sep 20, 2024

Poetry's design is interesting: https://python-poetry.org/docs/repositories/. It feels a bit more complex than is necessary though.

Here's an interesting pattern w/ poetry I've seen used to allow local dev on osx with pytorch but also enable GPU usage on linux specifically on x86_64.

[tool.poetry.dependencies]
...
torch = [
    ...
    { markers = "sys_platform == 'darwin' and platform_machine == 'arm64'", url = "https://download.pytorch.org/whl/cpu/foo_bar_cpu_wheel.whl"},
    { markers = "sys_platform == 'linux' and platform_machine == 'aarch64'", url = "https://download.pytorch.org/whl/cpu/foo_bar_cpu_wheel.whl"},
    { markers = "sys_platform == 'linux' and platform_machine == 'x86_64'", url = "https://download.pytorch.org/whl/cu121/foo_bar_gpu_wheel.whl"}
]

@groodt
Copy link

groodt commented Sep 20, 2024

Good news! 🎉

Initial PEP 708 support has arrived on pypi

@k0t3n
Copy link

k0t3n commented Oct 21, 2024

What about CLI? Will uv support adding dependencies with custom registries via uv add?

I with I could do

uv add --index internal somepackage

@gazpachoking
Copy link

What about CLI? Will uv support adding dependencies with custom registries via uv add?

It looks like #7747 sorta does that. It doesn't allow just giving the index name though, you have to give the name and url. uv add --index internal=https://internalindex.url somepackage Allowing just using an existing index name would be a nice improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement to existing functionality projects Related to project management capabilities
Projects
None yet
Development

Successfully merging a pull request may close this issue.