Skip to content

Plan for metadata proposal #230

Open
@henryiii

Description

@henryiii

Need

In the core metadata specification originally set out in PEP 621 there is the possibility of marking fields as "dynamic", allowing their values to be determined at build time rather than statically included in pyproject.toml. There are several popular packages which make use of this system, most notably setuptools_scm, which dynamically calculates a version string based on various properties from a project's source control system, but also e.g. hatch-fancy-pypi-readme, which builds a readme out of user-defined fragments (like the latest version's CHANGELOG). Most backends, including setuptools, PDM-backend, hatchling, and flit-core, also have built-in support for providing dynamic metadata from sources like reading files.

With the recent profusion of build-backends in the wake of PEPs 517 and 518, it is much more difficult for a user to keep using these kind of tools across their different projects because of the lack of a common interface. Each tool has been written to work with a particular backend, and can only be used with other backends by adding some kind of adapter layer. For example, setuptools_scm has already been wrapped into a hatchling plugin (hatch-vcs), and into scikit-build-core. Poetry also has a custom VCS versioning plugin (poetry-dynamic-versioning), and PDM has a built-in tool for it. However, these adapter layers are inconvenient to maintain (often being dependent on internal functions, for example), confusing to use, and result in a lot of duplication of both code and documentation.

We are proposing a unified interface that would allow metadata providing tools to implement a single function that build backends can call, and a standard format in which to return their metadata. Once a backend chooses to adopt this proposed mechanism, they will gain support for all plugins implementing it.

We are also proposing a modification to the project specification that has been requested by backend and plugin authors to loosen the requirements slightly on mixing dynamic and static metadata, enabling metadata plugins to be more easily adopted for some use cases.

Proposal

Implementing a metadata provider

Our suggestion is that metadata providers include a module (which could be the top level of the package, but need not be) which provides a function dynamic_metadata(fields, settings=None). The first argument is the list of fields requested of the plugin, and the second is the extra settings passed to the plugin configuration, possibly empty. This function will run in the same directory that build_wheel() runs in, the project root (to allow for finding other relevant files/folders like .git).

The function should return a dictionary matching the pyproject.toml structure, but only containing the metadata keys that have been requested. dynamic, of course, is not permitted in the result. Updating the pyproject_dict with this return value (and removing the corresponding keys from the original dynamic entry) should result in a valid pyproject_dict. The backend should only update the key corresponding to the one requested by the user. A backend is allowed (and recommended) to combine identical calls for multiple keys - for example, if a user sets "readme" and "license" with the same provider and arguments, the backend is only required to call the plugin once, and use the readme and license fields.

An optional hook1, get_requires_for_dynamic_metadata, allows providers to determine their requirements dynamically (depending on what is already available on the path, or unique to providing this plugin).

Here's an example implementation:

def dynamic_metadata(
    fields: Sequence[str],
    settings: Mapping[str, Any],
) -> dict[str, dict[str, str | None]]:
    if settings:
        raise RuntimeError("Inline settings are not supported by this plugin")
    if fields != ["readme"]:
        raise RuntimeError("This plugin only supports dynamic 'readme'")

    from hatch_fancy_pypi_readme._builder import build_text
    from hatch_fancy_pypi_readme._config import load_and_validate_config

    with Path("pyproject.toml").open("rb") as f:
        pyproject_dict = tomllib.read(f)

    config = load_and_validate_config(
        pyproject_dict["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"]
    )

    return {
        "readme": {
            "content-type": config.content_type,
            "text": build_text(config.fragments, config.substitutions),
        }
    }


def get_requires_for_dynamic_metadata(
    settings: Mapping[str, Any] | None = None,
) -> list[str]:
    return ["hatch-fancy-pypi-readme"]

Using a metadata provider

For maximum flexibility, we propose specifying a 1:1 mapping between the dynamic metadata fields and the providers (specifically the module implementing the interface) which will supply them.

The existing dynamic specification will be expanded to support a table as well:

[project.dynamic]
version = {provider = "plugin.submodule"}                   # Plugin
readme = {provider = "local_module", path = "scripts/meta"} # Local plugin
classifiers = {provider = "plugin.submodule", max="3.11"}   # Plugin with options
requires-python = {min = "3.8"}                             # Build-backend specific
dependencies = {}                                           # Identical to dynamic = ["dependences"]
optional-dependences = "some_plugin"                        # Shortcut for provider =

If project.dynamic is a table, a new provider="..." key will pull from a matching plugin with the hook outlined above. If path="..." is present as well, then the module is a local plugin in the provided local path (just like PEP 517's local backend path). All other keys are passed through to the hook; it is suggested that a hook validate for unrecognized keys. If no keys are present, the backend should fall back on the same behavior a string entry would provide.

Many backends already have some dynamic metadata handling. If keys are present without provider=, then the behavior is backend defined. It is highly recommended that a backend produce an error if keys that it doesn't expect are present when provider= is not given. Setuptools could simply its current tool.setuptools.dynamic support with this approach taking advantage of the ability to pass custom options through the field:

# Current
[project]
dynamic = ["version", "dependencies", "optional-dependencies"]

[tool.setuptools.dynamic]
version = {attr="mymod.__version__"}
dependencies = {file="requeriments.in"}
optional-dependencies.dev = {file="dev-requeriments.in"}
optional-dependencies.test = {file="test-requeriments.in"}


# After
[project.dynamic]
version = {attr="mymod.__version__"}
dependencies = {file="requeriments.in"}
optional-dependencies.dev = {file="dev-requeriments.in"}
optional-dependencies.test = {file="test-requeriments.in"}
# "provider = "setuptools.dynamic.version", etc. could be set but would be verbose

Another idea is a hypothetical regex based version discovery, which could look something like this if it was integrated into the backend:

[project.dynamic]
version = {location="src/package/version.txt", regex='Version\s*([\d.]+)'}

Or like this if it was a plugin:

[project.dynamic.version]
provider = "regex.searcher.version"
location = "src/package/version.txt"
regex = 'Version\s*([\d.]+)'

Using project.dynamic as a table keeps the specification succinct without adding extra fields, it avoids duplication, and it is handled by third party libraries that inspect the pyproject.toml exactly the same way (at least if they are written in Python). The downside is that it changes the existing specification, probably mostly breaking validation - however, this is most often done by the backend; a backend must already opt-into this proposal, so that is an acceptable change. pip and cibuildwheel, two non-backend tools that read pyproject.toml, are unaffected by this change.

To keep the most common use case simple2, passing a string is equivalent to passing the provider; version = "..." is treated like version = { provider = "..." }. This makes the backend implementation a bit more complex, but provides a simpler user experience for the most common expected usage. This is similar to the way to how keys like project.readme = and project.license = are treated today.

Supporting metadata providers:

An implementation of this proposal already exists for the scikit-build-core backend and uses only standard library functions. Implementations could be left up to individual build backends to provide but if the proposal were to be adopted then would probably coalesce into a single common implementation. pyproject-metdata could hold such a helper implementation.

Proposed changes in the semantics of project.dynamic

PEP 621 explicitly forbids a field to be "partially" specified in a static way (i.e. by associating a value to project.<field> in pyproject.toml) and later listed in dynamic.

This complicates the mechanism for dynamically defining fields with complex/compound data structures, such as keywords, classifiers and optional-metadata and requires backends to implement "workarounds". Examples of practices that were impacted by this restriction include:

In this PEP, we propose to lift this restriction and change the semantics associated with pyproject.dynamic in the following manner:

  • When a metadata field is simultaneously assigned a value and included in pyproject.dynamic, tools should assume that its value is partially defined. The given static value corresponds to a subset of the value expected after the build process is complete. Backends and dynamic providers are allowed augment the metadata field during the build process.

The fields that are arrays or tables with arbitrary entries are urls, authors, maintainers, keywords, classifiers, dependencies, scripts, entry-points, gui-scripts, and optional-dependencies.

Examples & ideas:

Current PEP 621 backends & dynamic metadata

Backend Dynamic? Config? Plugins?
setuptools
hatchling
flit-core
pdm-backend
scikit-build-core 3
meson-python
maturin
enscons
whey
trampolim

"Dynamic" indicates the tool supports at least one dynamic config option. "Config" indicates the tool has some tool-specific way to configure this option. "Plugins" refers to having a custom plugin ecosystem for these tools. Poetry has not yet adopted PEP 621, so is not listed above, but it does have dynamic metadata with custom configuration and plugins. This proposal will still help tools not using PEP 621, as they can still use the plugin API, just with custom configuration (but they are already using custom configuration for everything else, so that's fine).

Rejected ideas

Notes on extra file generation

Some metadata plugins generate extra files (like a static version file). No special requirements are made on such plugins or backends handling them in this proposal; this is inline with PEP 517's focus on metadata and lack of specifications file handling.

Config-settings

The config-settings dict could be passed to the plugin, but due to the fact there's no standard configuration design for config-settings, you can't have generally handle a specific config-settings item and be sure that no backend will also try to read it or reject it. There was also a design worry about adding this in setuptools, so it was removed (still present in the reference implementation, though).

Passing the pyproject.toml as a dict

This would add a little bit of complexity to the signature of the plugin, but would avoid reparsing the pyproject.toml for plugins that need to read it. Also would avoid an extra dependency on tomli for older Python versions. Custom inline settings alleviated the need for almost every plugin to read the pyproject.toml, so this was removed to keep backend implementations & signatures simpler.

New section

Instead of changing the dynamic metadata field to accept a table, instead there could be a new section:

dynamic = ["version"]

[dynamic-metadata]
version = {provider = "plugin_package.submodule"}

This is the current state of the reference implementation, using [tool.scikit-build.metadata] instead of [dynamic-metadata]. In this version, listing an item in dynamic-metadata should be treated as implicitly listing it in dynamic, though listing in both places can be done (primary for backward compatibility).

dynamic vs. dynamic-metadata could be confusing, as they do the same thing, and it actually makes parsing this harder for third-party tools, as now both project.dynamic and dynamic-metadata have to be combined to see what fields could be dynamic. The fact that dict keys and lists are handled the same way in Python provides a nice method to avoid this complication.

Alternative proposal: new array section

A completely different approach to specification could be taken using a new section and an array syntax4:

dynamic = ["version"]

[[dynamic-metadata]]
provider = "plugin_package.submodule"
path = "src"
provides = ["version"]

This has the benefit of not repeating the plugin if you are pulling multiple metadata items from it, and indicates that this is only going to be called once. It also has the benefit of allowing empty dynamic plugins, which has an interesting non-metadata use case, but is probably out of scope for the proposal. The main downside is that it's harder to parse for the dynamic values by third party projects, as they have to loop over dynamic-metadata and join all provides lists to see what is dynamic. It's also a lot more verbose, especially for the built-in plugin use case for tools like setuptools. (The current version of this suggestion listed above is much better than the original version we proposed, though!). This also would allow multiple plugins to provide the same metadata field, for better (maybe this could be used to allow combining lists or tables from multiple plugins) or worse (this has to be defined and properly handled).

This version could enable a couple of possible additions that were not possible in the current proposal. However, most users would not need these, and some of them are a bit out of scope - the current version is simpler for pyproject.toml authors and would address 95% of the plugin use cases.

Multiple plugins per field

The current proposal requires a metadata field be computed by one plugin; there's no way to use multiple plugins for a single field (like classifiers). This is expected to be rare in practice, and can easily be worked around in the current proposal form by adding a local plugin that itself calls the plugins it wants to combine following the standard API proposed. "Merging" the metadata then would be arbitray, since it's implemented by this local plugin, rather than having to be pre-defined here.

Empty plugins (for side effects)

A closely related but separate could be solved by this paradigm as well with some modifications. Several build tools (like cmake, ninja, patchelf, and swig) are actually system CLI tools that have optional pre-compiled binaries in the PyPI ecosystem. When compiling on systems that do not support binary wheels (a very common reason to compile!), such as WebAssembly, Android, FreeBSD, or ClearLinux, it is invalid to add these as dependencies. However, if the system versions of these dependencies are of a sufficient version, there's no need to add them either. A PEP 517 backend has the ability to declare dynamic dependencies, so this can be (and currently is) handled by tools like scikit-build-core and meson-python in this way. However, it might also be useful to allow this logic to be delegated to a metadata provider, this would potentially allow greater sharing of core functionality in this area.

For example, if you specified "auto_cmake" as a provider, it could provide get_requires_for_dynamic_metadata_wheel to supply this functionality to any backend. This will likely best be covered by the "extensionlib" idea, rather than plugins, so this is not worth trying to address unless this array based syntax becomes the proposed syntax - then it would be worth evaluating to see if it's worth trying to include.

Footnotes

  1. Most plugins will likely not need to implement this hook, so it could be removed. But it is symmetric with PEP 517, fairly simple to implement, and "wrapper" plugins, like the first two example plugins, need it. It is expected that backends that want to provide similar wrapper plugins will find this useful to implement.

  2. This also could be removed from the proposal if needed.

  3. In development, based on a version of proposal.

  4. Note, that unlike the proposed syntax, this probably should not repurpose project.metadata, since this would be much more likely to break existing parsing of this field by static tooling. (Static tooling often may not parse this field anyway, since it's easier to check for a missing field - you only need to check the dynamic today if you care about "missing" version "specified elsewhere".)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions