Skip to content

Commit

Permalink
Recursively resolve direct URL references upfront (#2684)
Browse files Browse the repository at this point in the history
## Summary

This PR would enable us to support transitive URL requirements. The key
idea is to leverage the fact that...

- URL requirements can only come from URL requirements.
- URL requirements identify a _specific_ version, and so don't require
backtracking.

Prior to running the "real" resolver, we recursively resolve any URL
requirements, and collect all the known URLs upfront, then pass those to
the resolver as "lookahead" requirements. This means the resolver knows
upfront that if a given package is included, it _must_ use the provided
URL.

Closes #1808.
  • Loading branch information
charliermarsh authored Apr 1, 2024
1 parent f60e749 commit a48bcae
Show file tree
Hide file tree
Showing 7 changed files with 106 additions and 96 deletions.
52 changes: 17 additions & 35 deletions PIP_COMPATIBILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,41 +38,6 @@ Instead, uv supports its own environment variables, like `UV_INDEX_URL`. In the
also support persistent configuration in its own configuration file format (e.g., `pyproject.toml`
or `uv.toml` or similar). For more, see [#651](https://github.com/astral-sh/uv/issues/651).

## Transitive direct URL dependencies

While uv does support direct URL dependencies (e.g., `black @ https://...`), it does not support
nested (or "transitive") direct URL dependencies, instead requiring that any direct URL dependencies
are declared upfront.

For example, if `black @ https://...` itself had a dependency on `toml @ https://...`, uv would
reject the transitive direct URL dependency on `toml` and require that `toml` be declared as a
dependency in the `pyproject.toml` file, like:

```toml
# pyproject.toml
dependencies = [
"black @ https://...",
"toml @ https://...",
]
```

This is a deliberate choice to avoid the correctness and security issues associated with allowing
transitive dependencies to introduce arbitrary URLs into the dependency graph.

For example:

- Your package depends on `package_a==1.0.0`.
- Your package depends on `package_b==1.0.0`.
- `package_b==1.0.0` depends on `package_a @ https://...`.

If `package_a @ https://...` happens to resolve to version `1.0.0`, `pip` would install `package_a`
from the direct URL. This is a security issue, since the direct URL could be controlled by an
attacker, and a correctness issue, since the direct URL could resolve to an entirely different
package with the same name and version.

In the future, uv may allow transitive URL dependencies in some form (e.g., with user opt-in).
For more, see [#1808](https://github.com/astral-sh/uv/issues/1808).

## Pre-release compatibility

By default, uv will accept pre-release versions during dependency resolution in two cases:
Expand Down Expand Up @@ -167,6 +132,23 @@ In the future, uv will support pinning packages to dedicated indexes (see: [#171
Additionally, [PEP 708](https://peps.python.org/pep-0708/) is a provisional standard that aims to
address the "dependency confusion" issue across package registries and installers.

## Transitive direct URL dependencies for constraints and overrides

While uv does support URL dependencies (e.g., `black @ https://...`), it does not support
_transitive_ (i.e., "nested") direct URL dependencies for constraints and overrides.

Specifically, if a constraint or override is defined using a direct URL dependency, and the
constrained package has a direct URL dependency of its own, uv _may_ reject that transitive direct
URL dependency during resolution.

uv also makes the assumption that non-URL dependencies won't introduce URL dependencies (i.e., that
dependencies fetched from a registry will not themselves have direct URL dependencies). If a non-URL
dependency _does_ introduce a URL dependency, uv will reject the URL dependency during resolution.

If uv rejects a transitive URL dependency in either case, the best course of action is to provide
the URL dependency as a direct dependency in the `requirements.in` file, rather than as a
constraint, override, or transitive dependency.

## Virtual environments by default

`uv pip install` and `uv pip sync` are designed to work with virtual environments by default.
Expand Down
91 changes: 46 additions & 45 deletions crates/uv-requirements/src/lookahead.rs
Original file line number Diff line number Diff line change
@@ -1,29 +1,34 @@
use std::collections::VecDeque;
use std::sync::Arc;

use anyhow::Result;
use anyhow::{Context, Result};
use futures::stream::FuturesUnordered;
use futures::StreamExt;
use rustc_hash::FxHashSet;

use distribution_types::{BuildableSource, Dist, LocalEditable};
use distribution_types::{Dist, LocalEditable};
use pep508_rs::{MarkerEnvironment, Requirement, VersionOrUrl};
use pypi_types::Metadata23;
use uv_client::RegistryClient;
use uv_distribution::{Reporter, SourceDistributionBuilder};
use uv_distribution::{DistributionDatabase, Reporter};
use uv_types::{BuildContext, Constraints, Overrides, RequestedRequirements};

/// A resolver for resolving lookahead requirements from local dependencies.
/// A resolver for resolving lookahead requirements from direct URLs.
///
/// The resolver extends certain privileges to "first-party" requirements. For example, first-party
/// requirements are allowed to contain direct URL references, local version specifiers, and more.
///
/// We make an exception for transitive requirements of _local_ dependencies. For example,
/// `pip install .` should treat the dependencies of `.` as if they were first-party dependencies.
/// This matches our treatment of editable installs (`pip install -e .`).
/// The lookahead resolver resolves requirements recursively for direct URLs, so that the resolver
/// can treat them as first-party dependencies for the purpose of analyzing their specifiers.
/// Namely, this enables transitive direct URL dependencies, since we can tell the resolver all of
/// the known URLs upfront.
///
/// The lookahead resolver resolves requirements for local dependencies, so that the resolver can
/// treat them as first-party dependencies for the purpose of analyzing their specifiers.
pub struct LookaheadResolver<'a> {
/// This strategy relies on the assumption that direct URLs are only introduced by other direct
/// URLs, and not by PyPI dependencies. (If a direct URL _is_ introduced by a PyPI dependency, then
/// the resolver will (correctly) reject it later on with a conflict error.) Further, it's only
/// possible because a direct URL points to a _specific_ version of a package, and so we know that
/// any correct resolution will _have_ to include it (unlike with PyPI dependencies, which may
/// require a range of versions and backtracking).
pub struct LookaheadResolver<'a, Context: BuildContext + Send + Sync> {
/// The direct requirements for the project.
requirements: &'a [Requirement],
/// The constraints for the project.
Expand All @@ -32,46 +37,43 @@ pub struct LookaheadResolver<'a> {
overrides: &'a Overrides,
/// The editable requirements for the project.
editables: &'a [(LocalEditable, Metadata23)],
/// The reporter to use when building source distributions.
reporter: Option<Arc<dyn Reporter>>,
/// The database for fetching and building distributions.
database: DistributionDatabase<'a, Context>,
}

impl<'a> LookaheadResolver<'a> {
impl<'a, Context: BuildContext + Send + Sync> LookaheadResolver<'a, Context> {
/// Instantiate a new [`LookaheadResolver`] for a given set of requirements.
pub fn new(
requirements: &'a [Requirement],
constraints: &'a Constraints,
overrides: &'a Overrides,
editables: &'a [(LocalEditable, Metadata23)],
context: &'a Context,
client: &'a RegistryClient,
) -> Self {
Self {
requirements,
constraints,
overrides,
editables,
reporter: None,
database: DistributionDatabase::new(client, context),
}
}

/// Set the [`Reporter`] to use for this resolver.
#[must_use]
pub fn with_reporter(self, reporter: impl Reporter + 'static) -> Self {
let reporter: Arc<dyn Reporter> = Arc::new(reporter);
Self {
reporter: Some(reporter),
database: self.database.with_reporter(reporter),
..self
}
}

/// Resolve the requirements from the provided source trees.
pub async fn resolve<T: BuildContext>(
self,
context: &T,
markers: &MarkerEnvironment,
client: &RegistryClient,
) -> Result<Vec<RequestedRequirements>> {
pub async fn resolve(self, markers: &MarkerEnvironment) -> Result<Vec<RequestedRequirements>> {
let mut results = Vec::new();
let mut futures = FuturesUnordered::new();
let mut seen = FxHashSet::default();

// Queue up the initial requirements.
let mut queue: VecDeque<Requirement> = self
Expand All @@ -88,7 +90,12 @@ impl<'a> LookaheadResolver<'a> {

while !queue.is_empty() || !futures.is_empty() {
while let Some(requirement) = queue.pop_front() {
futures.push(self.lookahead(requirement, context, client));
// Ignore duplicates. If we have conflicting URLs, we'll catch that later.
if matches!(requirement.version_or_url, Some(VersionOrUrl::Url(_))) {
if seen.insert(requirement.name.clone()) {
futures.push(self.lookahead(requirement));
}
}
}

while let Some(result) = futures.next().await {
Expand All @@ -110,12 +117,7 @@ impl<'a> LookaheadResolver<'a> {
}

/// Infer the package name for a given "unnamed" requirement.
async fn lookahead<T: BuildContext>(
&self,
requirement: Requirement,
context: &T,
client: &RegistryClient,
) -> Result<Option<RequestedRequirements>> {
async fn lookahead(&self, requirement: Requirement) -> Result<Option<RequestedRequirements>> {
// Determine whether the requirement represents a local distribution.
let Some(VersionOrUrl::Url(url)) = requirement.version_or_url.as_ref() else {
return Ok(None);
Expand All @@ -124,29 +126,28 @@ impl<'a> LookaheadResolver<'a> {
// Convert to a buildable distribution.
let dist = Dist::from_url(requirement.name, url.clone())?;

// Only support source trees (and not, e.g., wheels).
let Dist::Source(source_dist) = &dist else {
return Ok(None);
};
if !source_dist.as_path().is_some_and(std::path::Path::is_dir) {
return Ok(None);
}

// Run the PEP 517 build process to extract metadata from the source distribution.
let builder = if let Some(reporter) = self.reporter.clone() {
SourceDistributionBuilder::new(client, context).with_reporter(reporter)
let (metadata, _precise) = self
.database
.get_or_build_wheel_metadata(&dist)
.await
.with_context(|| match &dist {
Dist::Built(built) => format!("Failed to download: {built}"),
Dist::Source(source) => format!("Failed to download and build: {source}"),
})?;

// Consider the dependencies to be "direct" if the requirement is a local source tree.
let direct = if let Dist::Source(source_dist) = &dist {
source_dist.as_path().is_some_and(std::path::Path::is_dir)
} else {
SourceDistributionBuilder::new(client, context)
false
};

let metadata = builder
.download_and_build_metadata(&BuildableSource::Dist(source_dist))
.await?;

// Return the requirements from the metadata.
Ok(Some(RequestedRequirements::new(
requirement.extras,
metadata.requires_dist,
direct,
)))
}
}
1 change: 1 addition & 0 deletions crates/uv-resolver/src/manifest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ impl Manifest {
) -> impl Iterator<Item = &PackageName> {
self.lookaheads
.iter()
.filter(|lookahead| lookahead.direct())
.flat_map(|lookahead| {
self.overrides
.apply(lookahead.requirements())
Expand Down
10 changes: 9 additions & 1 deletion crates/uv-types/src/requirements.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,17 @@ pub struct RequestedRequirements {
extras: Vec<ExtraName>,
/// The set of requirements that were requested by the originating requirement.
requirements: Vec<Requirement>,
/// Whether the dependencies were direct or transitive.
direct: bool,
}

impl RequestedRequirements {
/// Instantiate a [`RequestedRequirements`] with the given `extras` and `requirements`.
pub fn new(extras: Vec<ExtraName>, requirements: Vec<Requirement>) -> Self {
pub fn new(extras: Vec<ExtraName>, requirements: Vec<Requirement>, direct: bool) -> Self {
Self {
extras,
requirements,
direct,
}
}

Expand All @@ -32,4 +35,9 @@ impl RequestedRequirements {
pub fn requirements(&self) -> &[Requirement] {
&self.requirements
}

/// Return whether the dependencies were direct or transitive.
pub fn direct(&self) -> bool {
self.direct
}
}
15 changes: 11 additions & 4 deletions crates/uv/src/commands/pip_compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -338,10 +338,17 @@ pub(crate) async fn pip_compile(
};

// Determine any lookahead requirements.
let lookaheads = LookaheadResolver::new(&requirements, &constraints, &overrides, &editables)
.with_reporter(ResolverReporter::from(printer))
.resolve(&build_dispatch, &markers, &client)
.await?;
let lookaheads = LookaheadResolver::new(
&requirements,
&constraints,
&overrides,
&editables,
&build_dispatch,
&client,
)
.with_reporter(ResolverReporter::from(printer))
.resolve(&markers)
.await?;

// Create a manifest of the requirements.
let manifest = Manifest::new(
Expand Down
15 changes: 11 additions & 4 deletions crates/uv/src/commands/pip_install.rs
Original file line number Diff line number Diff line change
Expand Up @@ -539,10 +539,17 @@ async fn resolve(
.collect();

// Determine any lookahead requirements.
let lookaheads = LookaheadResolver::new(&requirements, &constraints, &overrides, &editables)
.with_reporter(ResolverReporter::from(printer))
.resolve(build_dispatch, markers, client)
.await?;
let lookaheads = LookaheadResolver::new(
&requirements,
&constraints,
&overrides,
&editables,
build_dispatch,
client,
)
.with_reporter(ResolverReporter::from(printer))
.resolve(markers)
.await?;

// Create a manifest of the requirements.
let manifest = Manifest::new(
Expand Down
18 changes: 11 additions & 7 deletions crates/uv/tests/pip_compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1699,7 +1699,7 @@ fn compatible_repeated_narrowed_url_dependency() -> Result<()> {
fn incompatible_narrowed_url_dependency() -> Result<()> {
let context = TestContext::new("3.12");
let requirements_in = context.temp_dir.child("requirements.in");
requirements_in.write_str("werkzeug @ git+https://github.com/pallets/werkzeug.git@main\nwerkzeug @ git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f\nwerkzeug @ git+https://github.com/pallets/werkzeug.git@v1.0.0")?;
requirements_in.write_str("werkzeug @ git+https://github.com/pallets/werkzeug.git@main\nwerkzeug @ git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f\nwerkzeug @ git+https://github.com/pallets/werkzeug.git@3.0.1")?;

uv_snapshot!(context.compile()
.arg("requirements.in"), @r###"
Expand All @@ -1710,31 +1710,35 @@ fn incompatible_narrowed_url_dependency() -> Result<()> {
----- stderr -----
error: Requirements contain conflicting URLs for package `werkzeug`:
- git+https://github.com/pallets/werkzeug@32e69512134c2f8183c6438b2b2e13fd24e9d19f
- git+https://github.com/pallets/werkzeug.git@v1.0.0
- git+https://github.com/pallets/werkzeug.git@3.0.1
"###
);

Ok(())
}

/// Request `hatchling_editable`, which depends on `https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl`.
/// Since this URL isn't declared upfront, we should reject it.
#[test]
#[cfg(feature = "git")]
fn disallowed_transitive_url_dependency() -> Result<()> {
fn allowed_transitive_git_dependency() -> Result<()> {
let context = TestContext::new("3.12");

let requirements_in = context.temp_dir.child("requirements.in");
requirements_in.write_str("hatchling_editable @ https://github.com/astral-sh/uv/files/14762645/hatchling_editable.zip")?;

uv_snapshot!(context.compile()
.arg("requirements.in"), @r###"
success: false
exit_code: 2
success: true
exit_code: 0
----- stdout -----
# This file was autogenerated by uv via the following command:
# uv pip compile --cache-dir [CACHE_DIR] --exclude-newer 2024-03-25T00:00:00Z requirements.in
hatchling-editable @ https://github.com/astral-sh/uv/files/14762645/hatchling_editable.zip
iniconfig @ git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4
# via hatchling-editable
----- stderr -----
error: Package `iniconfig` attempted to resolve via URL: git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4. URL dependencies must be expressed as direct requirements or constraints. Consider adding `iniconfig @ git+https://github.com/pytest-dev/iniconfig@9cae43103df70bac6fde7b9f35ad11a9f1be0cb4` to your dependencies or constraints file.
Resolved 2 packages in [TIME]
"###
);

Expand Down

0 comments on commit a48bcae

Please sign in to comment.