Skip to content

Commit

Permalink
More PEP 508 notes
Browse files Browse the repository at this point in the history
  • Loading branch information
mitsuhiko committed Nov 13, 2023
1 parent 14a8c7c commit c4531df
Showing 1 changed file with 151 additions and 0 deletions.
151 changes: 151 additions & 0 deletions notes/pep508.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# PEP 508 Limitations

*This document collects my notes on PEP 508.*

A package manager for Python is naturally constrained by the metadata format that the
community embraces. While a package manager can in theory add a custom metadata format
like poetry did, it limits the ability to publish such packages and prevents inter-tool
compatibility.

The metadata format in the Python ecosystem is the `pyproject.toml` and the most significant
limitation is the dependency specification from [PEP 508](https://peps.python.org/pep-0508/).

## Dependency Array Limitations

For a package manager the most important piece of information are dependencies. In Python
there are two dependency sections, both are very limited in that they are lists of strings.

For instance `project.dependencies` is an array where each item is a PEP 508 string. This
means that within that array, only information can be carried that PEP 508 supports. If a
package manager requires additional information (temporary or permanently) it does not find
place in the array for it.

Example:

```toml
[project]
dependencies = [
'Flask>=2.0',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
```

If additional information wants to be stored with either one of those dependencies, the only
reasonable correlation today would be to either duplicate it elsewhere, or to index by number.

```toml
[tool.rye.dependencies_meta.0]
extra_information = 42
```

No matter how, this is always a challenge. In `Cargo.toml` the information is both kept as
array of objects, but additionally "merging" sections are provided. This means that platform
and target specific dependencies are merged in at a higher level. Additionally if a dependency
is contained more than once, it can be renamed. For Python the `Cargo.toml` format does not make
too much sense, but some alternatives might work:

```toml
[project.dependencies]
Flask = { version = ">=2.0" }

[project.dependencies.more-itertools]
match = [
{ version=">=4.0.0,<6.0.0", python_version = "<=2.7" },
{ version=">=4.0.0", python_version = ">2.7" }
]
```

In that case at all times an object is placed in the format where extra keys could be added:

```toml
[project.dependencies]
Flask = { version = ">=2.0", extra_information = 42 }
```

That however would be a significant departure from where the format is today. A pragmatic
alternative could be to utilize the URL portion of the dependency to funnel additional
information into the package. Unfortunately PyPI will refuse to accept any PEP 508 marker
that contains a URL. Additionally URLs also have restrictions in that they cannot be
combined with version markers. The former issue might not be an issue as such a
dependency could be rewritten before publishing into an automated format.
`cargo` does something like this today already with `Cargo.toml` which is rewritten for
publishing.

Given that PEP 508 cannot really be amended without causing a massive rift in the
ecosystem the solution has to come down to encoding it into something that is already
part of the structure. My current thinking is that the format should permit objects
where the strings are today and to re-write on publish. So this format might become
possible:

```toml
[project]
dependencies = [
{ version = 'Flask@>=2.0', extra_information = 42 },
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]

# turns into

[project]
dependencies = [
'Flask@>=2.0',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]

[project.dependencies_meta.0]
exta_information = 42
```

Unfortunately that would still cause issues for tools that locally interpret `pyproject.toml`
files.

Alternatively a correlation marker could be added. This could be done by inventing a new
marker (requires double quoting):

```toml
[project]
dependencies = [
'Flask@>=2.0 ; "id" == "x"',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]

[project.dependencies_meta.x]
extra_information = 42
```

If the correlation marker is missing, the package name would be assumed (in this case `Flask`).

## Local Dependencies

In a very similar mannor local dependencies do not really work in that world yet. Personally

Check warning on line 124 in notes/pep508.md

View workflow job for this annotation

GitHub Actions / Spell check

"mannor" should be "manner".
I believe this is similar to the issue above as we have no reasonable way to encode both the
version and the path location into one place. Contrast this to `Cargo.toml`:

```toml
[dependencies]
foo = { version = "1.0", path = "../crates" }
```

Upon publishing, `cargo` removes the `path` marker. Before publishing, the `path` takes
precedence over the lookup from the index.

If one were to be able to encode a correlation marker maybe this format could work:


```toml
[project]
dependencies = [
'Flask@>=2.0 ; "id" == "flask"',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]

[project.dependencies_meta.flask]
path = "../packages/flask"
```

(Again, if there is no marker, the package name could be directly used).

0 comments on commit c4531df

Please sign in to comment.