Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build backend tracking issue #8779

Open
22 of 30 tasks
konstin opened this issue Nov 3, 2024 · 4 comments
Open
22 of 30 tasks

Build backend tracking issue #8779

konstin opened this issue Nov 3, 2024 · 4 comments
Assignees
Labels
build-backend enhancement New feature or improvement to existing functionality preview Experimental behavior tracking A "meta" issue that tracks completion of a bigger task via a list of smaller scoped issues.

Comments

@konstin
Copy link
Member

konstin commented Nov 3, 2024

uv should provide its own build backend. This issue tracks this work, it will become more granular as more things are implemented.

Documentation

Minimal draft documentation for the build backend preview feature. The rough write-up here is for beta testing, the final documentation will have more background and explain the pyproject.toml fields.

Currently, the build backend is in preview. You need to set the UV_PREVIEW=1 environment variable and use the --preview flag with all CLI commands.

In Python, building packages is split into two parts: The build frontend and the build backend. This allows package managers, such a pip, uv and poetry, to build packages with any other build system, such as setuptools, hatchling, uv or poetry-core. uv provides both a build frontend (uv build) and a build backend (build-backend = "uv" in pyproject.toml). You can use either without the other, or both together.

Getting started: Add the following section to your pyproject.toml, removing the existing build system if any:

[build-system]
requires = ["uv>=0.5,<0.6"]
build-backend = "uv"

By default, uv expects your code to be in src/<package_name_with_underscores>.

The uv build backend uses the fields from the [project] section in pyproject.toml. There are configuration options in tool.uv.build-backend to select the files for the distributions. To select which files to include in the source distribution, we first add the includes, then remove the excludes from that. You can check the file list with uv build --preview --list.

Include and exclude configuration

When building the source distribution, the following files and directories are included:

  • pyproject.toml
  • The module under tool.uv.build-backend.module-root, by default
    src/<project_name_with_underscores>/**.
  • project.license-files and project.readme.
  • All directories under tool.uv.build-backend.data.
  • All patterns from tool.uv.build-backend.source-include.

From these, we remove the tool.uv.build-backend.source-exclude matches.

When building the wheel, the following files and directories are included:

  • The module under tool.uv.build-backend.module-root, by default
    src/<project_name_with_underscores>/**.
  • project.license-files and project.readme, as part of the project metadata.
  • Each directory under tool.uv.build-backend.data, as data directories.

From these, we remove the tool.uv.build-backend.source-exclude and
tool.uv.build-backend.wheel-exclude matches. The source dist excludes are applied to avoid
source tree -> wheel source including more files than
source tree -> source distribution -> wheel.

There are no specific wheel includes. There must only be one top level module, and all data
files must either be under the module root or in a data directory. Most packages store small
data in the module root alongside the source code.

Include and exclude syntax

Includes are anchored, which means that pyproject.toml includes only
<project root>/pyproject.toml. Use for example assets/**/sample.csv to include for all
sample.csv files in <project root>/assets or any child directory. To recursively include
all files under a directory, use a /** suffix, e.g. src/**. For performance and
reproducibility, avoid unanchored matches such as **/sample.csv.

Excludes are not anchored, which means that __pycache__ excludes all directories named
__pycache__ and it's children anywhere. To anchor a directory, use a / prefix, e.g.,
/dist will exclude only <project root>/dist.

The glob syntax is the reduced portable glob from
PEP 639.

Example

[project]
name = "built-by-uv"
version = "0.1.0"
description = "A package to be built with the uv build backend that uses all features exposed by the build backend"
readme = "README.md"
requires-python = ">=3.12"
dependencies = ["anyio>=4,<5"]
license-files = ["LICENSE*", "third-party-licenses/*"]

[tool.uv.build-backend]
# A file we need for the source dist -> wheel step, but not in the wheel itself (currently unused)
source-include = ["data/build-script.py"]
# A temporary or generated file we want to ignore
source-exclude = ["/src/built_by_uv/not-packaged.txt"]
# Headers are build-only
wheel-exclude = ["build-*.h"]

[tool.uv.build-backend.data]
scripts = "scripts"
data = "assets"
headers = "header"

[build-system]
requires = ["uv>=0.5,<0.6"]
build-backend = "uv"

Options

tool.uv.build-backend.module-root

The directory that contains the module directory, usually src, or an empty path when
using the flat layout over the src layout.

tool.uv.build-backend.source-include

Glob expressions which files and directories to additionally include in the source
distribution.

pyproject.toml and the contents of the module directory are always included.

The glob syntax is the reduced portable glob from
PEP 639.

tool.uv.build-backend.default-excludes

If set to false, the default excludes aren't applied.

Default excludes: __pycache__, *.pyc, and *.pyo.

tool.uv.build-backend.source-excludes

Glob expressions which files and directories to exclude from the source distribution.

tool.uv.build-backend.wheel-excludes

Glob expressions which files and directories to exclude from the wheel.

tool.uv.build-backend.data

Data includes for wheels.

The directories included here are also included in the source distribution. They are copied
to the right wheel subdirectory on build.

TODO

Background reading

Including and excluding files in packages

To go from the source code to an installed (or published) package, we always start with a source tree (e.g. a directory in a repo, mostly the repo root), which is identified by containing a pyproject.toml with a build-system and a project section. To install the project, we always have to go through a wheel. Getting to the wheel is the task of the build backend, installing the wheel is a different part of uv (a well-defined one). From the source tree, we can then either build a source distribution and from that wheel, or directly a wheel. This means that the source distribution must contain all files needs for the wheel.

The source dist has an indirection where there’s a <name>-<version> directory at the root and everything is below it, but that’s an implementation details. For our purposes, source dists and wheels have a root directory we can add to.

A source distribution usually contains a subset of the source tree in its root, excluding generated and cache directories (.venv, .pytest_cache, etc.) and development files (tests, test data, CI, etc.), while including the main python module, certain metadata files (pyproject.toml, readme and its images, licenses) and crucial data files and blobs (sample dataframes in pandas, manylinux json in maturin, lists of known endpoints, db schemas, headers for c projects, launcher scripts, etc.) that may either live next to the source code or in one of the dedicated data dirs below.

PEP 639 defines license file globs such as project.license-files = ["third-party/LICEN[CS]E*", "AUTHORS*"], which we must support as given. These files have to be copied to the root for the source dist, and to <name>-<version>.dist-info/licenses in the wheel. In the source dist, we have to include the readme if linked from project.readme, in the wheel it becomes part of METADATA.

Our main module usually exists at src/<name> , or alternatively at <name>. For the src/<name> layout, it needs to move to <name> in the wheel (recursive directory copy). This directory may contain python source files, files used by the source files (say some json with endpoints or a db schema sql) and files that we should skip such as .pyc and __pycache__.

Wheels (but not source dists) allow data directories in <name>-<version>.data/<type> , five different predefined ones. We have to allow the user to define which directory/files to include here, and then also copy those to the source dist.

A special case are native modules (.so/.pyd), if we want to support them. These may exist in the source tree for development (esp. editables), but must not be copied to the source dist, but must be generated from the source dist and added to the wheel.

We may want to allow the user to include different files in source tree → source dist than in source tree (repo or unpacked from a source dist) → wheel, especially when a build or code generation step becomes involved.

At the top level, the wheel must only contain a single module (we don’t support wheels with more than one top level module), so there are no custom include patterns for wheels: The wheels contains dist info (including license files), data files and (potentially with exclusions) the root module directory.

Even through all this, the majority of projects will want three features: A Readme (potentially with a transform for pypi), license file(s) and a src/<name> directory. These can be covered by the right defaults, so most users shouldn’t need to change the default includes/excludes.

Tool Review

poetry

By default, uses gitignore. Using include makes it ignore gitignore.

[tool.poetry]
include = [
    { path = "tests", format = "sdist" },
    { path = "for_wheel.txt", format = ["sdist", "wheel"] }
]

If no format is specified, include defaults to only sdist.

In contrast, exclude defaults to both sdist and wheel.

pdm

includes (wheel), source-includes and package-dir.

If a file is covered by both includes and excludes, the one with the more path parts and less wildcards in the pattern wins, otherwise excludes takes precedence if the length is the same.

For example, given the following configuration:

includes = ["src"]
excludes = ["**/*.json"]

src/foo/data.json will be excluded since the pattern in excludes has more path parts, however, if we change the configuration to:

includes = ["src", "src/foo/data.json"]
excludes = ["**/*.json"]

the same file will be included since it is covered by includes with a more specific path.

Test files under tests, if found, are included by sdist and excluded by other formats.

*.pyc, __pycache__/ and build/ are always excluded.

hatch(ling)

Respects gitignore and hgignore by default, ignore-vcs to ignore.

Include, then exclude, every entry represents a [Git-style glob pattern](https://git-scm.com/docs/gitignore#_pattern_format), uses pathspec.GitIgnoreSpec.from_lines internally.

[tool.hatch.build.targets.sdist]
include = [
  "pkg/*.py",
  "/tests",
]
exclude = [
  "*.json",
  "pkg/_compat.py",
]

You can use the only-include option to prevent directory traversal starting at the project root and only select specific relative paths to directories or files. Using this option ignores any defined include patterns.

There is an artifacts option to include gitignored files.

There is a hardcoded set of excluded directories (.git, __pycache__, etc.) and files (.DS_Store).

There is a skip-excluded-dirs for performance and only-include for only traversing certain directories.

maturin

Include and exclude are inspired by poetry.

include = [
  { path = "path/**/*", format = "sdist" },
  { path = "all", format = ["sdist", "wheel"] },
  { path = "for/wheel/**/*", format = "wheel" }
]

scikit-build-core

Uses gitignore by default, you can specify sdist includes and excludes, and wheel excludes.

For packages, it supports renames (last component must match):

[tool.scikit-build.wheel.packages]
"mypackage/subpackage" = "python/src/subpackage"

cargo

Include and exclude with gitignore syntax. By default, all files are included, not just src, but when specifying include manually, it will ignore src by default.

If include is not specified, then the following files will be excluded:

  • If the package is not in a git repository, all “hidden” files starting with a dot will be skipped.
  • If the package is in a git repository, any files that are ignored by the [gitignore](https://git-scm.com/docs/gitignore) rules of the repository and global git configuration will be skipped.

Regardless of whether exclude or include is specified, the following files
are always excluded:

  • Any sub-packages will be skipped (any subdirectory that contains a Cargo.toml file).
  • A directory named target in the root of the package will be skipped.

The following files are always included:

npm

There is a files list for includes with gitignore syntax, by default .gitignore is used but .npmignore takes precedence. There are mandatory includes, there are default excludes, and there are mandatory excludes.

Ecosystem Review

A random assortment of projects and syntaxes as data points.

boto3

include CONTRIBUTING.rst
include README.rst
include LICENSE
include requirements.txt
recursive-include boto3/data *.json

httpx

[tool.hatch.build.targets.sdist]
include = [
    "/httpx",
    "/CHANGELOG.md",
    "/README.md",
    "/tests",
]

charset-normalizer

include LICENSE README.md CHANGELOG.md charset_normalizer/py.typed dev-requirements.txt
recursive-include data *.md
recursive-include data *.txt
recursive-include docs *
recursive-include tests *

idna

[tool.flit.sdist]
exclude = [".gitignore", ".github/"]
include = ["tests", "tools", "HISTORY.rst"]

typing-extensions

[tool.flit.sdist]
include = ["CHANGELOG.md", "README.md", "tox.ini", "*/*test*.py"]
exclude = []

django

include AUTHORS
include Gruntfile.js
include INSTALL
include LICENSE
include LICENSE.python
include MANIFEST.in
include package.json
include tox.ini
include *.rst
graft django
graft docs
graft extras
graft js_tests
graft scripts
graft tests
global-exclude *.py[co]

pydantic

[tool.hatch.build.targets.sdist]
# limit which files are included in the sdist (.tar.gz) asset,
# see https://github.com/pydantic/pydantic/pull/4542
include = [
    '/README.md',
    '/HISTORY.md',
    '/Makefile',
    '/pydantic',
    '/tests',
]

scikit-learn

Programmatically with meson, i think

spacy

recursive-include spacy *.pyi *.pyx *.pxd *.txt *.cfg *.jinja *.toml *.hh
include LICENSE
include README.md
include pyproject.toml
include spacy/py.typed
recursive-include spacy/cli *.yml
recursive-include licenses *
recursive-exclude spacy *.cpp

auditwheel

include README.rst
include LICENSE
include CHANGELOG.md
include src/auditwheel/policy/*.json
include src/auditwheel/_vendor/wheel/LICENSE.txt

graft tests

exclude .coveragerc
exclude .gitignore
exclude .git-blame-ignore-revs
exclude .pre-commit-config.yaml
exclude .travis.yml
exclude noxfile.py

prune .github
prune scripts
prune tests/**/__pycache__
prune tests/**/*.egg-info
prune tests/**/build

global-exclude *.so .DS_Store

ripgrep

exclude = [
  "HomebrewFormula",
  "/.github/",
  "/ci/",
  "/pkg/brew",
  "/benchsuite/",
  "/scripts/",
]

alphafold3

[tool.scikit-build]
wheel.exclude = [
    "**.pyx",
    "**/CMakeLists.txt",
    "**.cc",
    "**.h"
]
sdist.include = [
    "LICENSE",
    "OUTPUT_TERMS_OF_USE.md",
    "WEIGHTS_PROHIBITED_USE_POLICY.md",
    "WEIGHTS_TERMS_OF_USE.md",
]

watchfiles

@konstin konstin added enhancement New feature or improvement to existing functionality preview Experimental behavior labels Nov 3, 2024
@konstin konstin self-assigned this Nov 3, 2024
@samypr100 samypr100 added the tracking A "meta" issue that tracks completion of a bigger task via a list of smaller scoped issues. label Nov 4, 2024
@hauntsaninja
Copy link
Contributor

Check MANIFEST.in could also be nice! https://github.com/mgedmin/check-manifest

@cthoyt
Copy link
Contributor

cthoyt commented Dec 4, 2024

@konstin I was able to get my first builds working using this on the 0.5.5 release (also w/ tox and tox-uv)! There were a few tricks and rough edges I had to get around to make it work.

Are you interested in feedback at this point? I totally understand if not - you are probably aware of a lot of things already and I don't want to bog you down.

If so, what's the preferred mechanism? I am happy to open up issues with minimum reproducibility scripts where appropriate.

Grüssie aus Bonn

@konstin
Copy link
Member Author

konstin commented Dec 4, 2024

With #9621, everything that's needed for alpha testing has now landed on main. I've updated the original post with some basic documentation.

tl;dr: Set UV_PREVIEW=1 and add:

[build-system]
requires = ["uv>=0.5,<0.6"]
build-backend = "uv"

The functionality for the basic build backend and its integration has landed (all behind preview), the biggest item to still change is the include/exclude syntax, here we want to sync up with red knot and optimize towards typical project usage without loss of generality (see also the ecosystem review in the original post).

Currently not part of the design is generating files or any part of build step, such as cython compilation, code generation from schema, but also compiling native modules in C, C++ or Rust. The includes allow including extra files in the source dist for a future compilation or plugin step.


@cthoyt It's great to have early adopters! If you have a bug or specific feature request, just open a new issue, minimal reproductions are of course always greatly appreciated :) For general discussion on what the build backend should (or shouldn't) do and other design discussions, I'd use #3957, while I'll post general updates in this issue. Please share your feedback, i love hearing from users!

@uwu-420
Copy link

uwu-420 commented Dec 13, 2024

Really excited to see progress on uv's own build backend <3

I know this is currently out of scope, but I'd be interested if there are plans to eventually extend the build backend to handle building for multiple python versions in one go. Currently one would have to use tools like tox/no, cibuildwheel or a custom build script but I think it would be wonderful to only run uv build and call it a day. Of course with some previous configuration e.g. specifying the desired target versions.

Or am I missing the point what a build backend should handle or if there is already an idiomatic way to do this?

Edit: Okay I could call uv build multiple times with the --python=... argument which is good enough for the moment I guess :) But the question still remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-backend enhancement New feature or improvement to existing functionality preview Experimental behavior tracking A "meta" issue that tracks completion of a bigger task via a list of smaller scoped issues.
Projects
None yet
Development

No branches or pull requests

5 participants