Skip to content

Commit

Permalink
CEP for specifying the location of site-packages in repodata (#90)
Browse files Browse the repository at this point in the history
Co-authored-by: Daniel Holth <dholth@anaconda.com>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
  • Loading branch information
4 people authored Oct 22, 2024
1 parent 189d482 commit 982868f
Show file tree
Hide file tree
Showing 2 changed files with 164 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ for conda's implementation, all major changes should be submitted as
| [13](cep-13.md) | A new recipe format - part 1 |
| [14](cep-14.md) | A new recipe format – part 2 - the allowed keys & values |
| [15](cep-15.md) | Hosting repodata.json and packages separately by adding a base_url property. |
| [16](cep-16.md) | Sharded Repodata |
| [17](cep-17.md) | Optional python site-packages path in repodata |

## References

Expand Down
162 changes: 162 additions & 0 deletions cep-17.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
<table>
<tr><td> Title </td><td> Optional python site-packages path in repodata </td>
<tr><td> Status </td><td> Proposed </td></tr>
<tr><td> Author(s) </td><td> Jonathan Helmus &lt;jjhelmus@gmail.com&gt;</td></tr>
<tr><td> Created </td><td> Sep 13, 2024</td></tr>
<tr><td> Updated </td><td> Sep 13, 2024</td></tr>
<tr><td> Discussion </td><td> https://github.com/conda/ceps/pull/90 </td></tr>
<tr><td> Implementation </td><td> NA </td></tr>
</table>

To avoid confusion this document will use a stylized `python` to indicate a conda package with that name.
Other uses of "Python" will be followed by a qualifier, like "programming language" or "interpreter".
Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy.

## Abstract

We propose adding a new optional field in `repodata.json` that `python` packages can use to specify the destination path when installing `noarch: python` packages.

## Background

In the conda ecosystem the package named `python` provides, either directly or via a dependency, an interpreter for the Python programming language.
This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy.

Tools like conda, mamba and pixi support installing `noarch: python` packages into conda environments.
Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of `noarch: python` packages are linked into the site-packages directory of the target environment.

In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is `Lib/site-packages` on Windows and `lib/pythonX.Y/site-packages` on other platforms where X and Y are the major and minor versions of the `python` package installed in the environment.
These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython.
For example PyPy uses `lib/pypyX.Y/site-packages` on POSIX systems.

In these cases kludgy workarounds, [often a symlink](https://github.com/conda-forge/pypy3.6-feedstock/blob/ac4201e80432469971626d277718edf80b22d6ef/recipe/build.sh#L102), are used in the `python` package to support the installation of `noarch: python` packages.
In the next section a new field will be proposed that allows `python` package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds.

## Specification

An optional field `python_site_packages_path` may be included in a conda package's `info/index.json` file as a string and in the corresponding `repodata.json` entry for the package.
When defined, this field specifies the path of the Python interpreter's site-packages directory relative to the root of the environment.
If a `python` package that includes this field is installed into an environment, all `noarch: python` packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field.

A value of "null" can be used to indicate that this field is not specified.
It is acceptable to not include the field in `repodata.json` in this case.

If this field is missing or specified as `null` a default site-packages path will be used.
This path is `Lib/site-packages` if the environment targets a Windows platform and `lib/pythonX.Y/site-packages` on other platforms where X and Y are the major and minor versions of the `python` package installed in the environment.
This matches the current behavior of conda, mamba and pixi.

This field should only be included in the metadata for `python` packages but tools must not fail if packages with other names include this field, rather the entry should be ignored.

To avoid unnecessarily increasing the size of `repodata.json`, it is highly recommended that this field only be included in `python` packages where it is required, that is where the site-packages directory is not the default value.
Packages with other names or `python` packages that use the default site-packages path should not include this field.

This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path).
If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user.

A possible way to check the validity of the `python_site_packages_path` field is with this function:

``` Python
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
```

When a package which includes this optional field is indexed the `python_site_packages_path` field will be included in the repodata entry for the package.
For example the repodata entry for a `python-3.13.0rc1` package using the free-threading build configuration might look as follows:

```
"python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": {
"build": "haa6bb3f_0_cpython_cp313t",
"build_number": 0,
"depends": [
"bzip2 >=1.0.8,<2.0a0",
"expat >=2.6.2,<3.0a0",
"libffi >=3.4,<4.0a0",
"ncurses >=6.4,<7.0a0",
"openssl >=3.0.14,<4.0a0",
"readline >=8.1.2,<9.0a0",
"sqlite >=3.45.3,<4.0a0",
"tk >=8.6.14,<8.7.0a0",
"tzdata",
"xz >=5.4.6,<6.0a0",
"zlib >=1.2.13,<1.3.0a0"
],
"license": "PSF-2.0",
"license_family": "PSF",
"timestamp": 1722610680768,
"track_features": "free-threading",
"md5": "c09289eb86239e1221533457d861f1a3",
"name": "python",
"size": 16332720,
"subdir": osx-arm64,
"version": "3.13.0rc1",
"sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
"python_site_packages_path": "lib/python3.13t/site-packages"
},
```

This field will also be included in repodata derivatives, like `current_repodata.json` or sharded repodata when appropriate.

Because this field is present in `repodata.json` it can be hot-fixed to correct mistakes or omissions.
This allows existing `python` packages to retroactively specify the locations of their site-packages directory.

## Motivation

This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems.
Specifically, the free-threading build uses `lib/python3.13t/site-packages`.

A symlink between these paths or logic in [sitecustomize.py](https://docs.python.org/3/library/site.html#module-sitecustomize) allows packages installed into the incorrect site-packages directory to operate but this is less than ideal.

For additional details and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations.
This includes both PyPy and GraalPy which use different paths for site-packages than CPython.
The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory.

## Security

Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails.

A primary security concern is that by allowing the `python` package to specify where files will be installed, a malicious package could write or overwrite files with malicious content.

If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment.

For example a malicious `python` package could specify an invalid site-packages path, in which case `noarch: python` packages installed in the environment would not work.
This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question.

The malicious package could also specify a site-packages path that would cause files from `noarch: python` packages to populate or replace key files in the environment.
But this type of attack is already possible, in fact the `python` package itself could include these files.

Of more concern is the possibility of a package writing files outside of the environment.
This could replace critical system files with malicious content.
Because of this risk, paths which are outside of the environment are invalid as `python_site_packages_path` entries.
Tools must check that the field is valid and error out when it is invalid.

## Backwards Compatibility

This change maintains backwards compatibility.
All existing `python` packages do not include this field.
The behavior when this field is not specified matches what is currently done by conda, mamba and pixi.
The only behavior change occurs when this field is included.

Because existing releases of conda, mamba and pixi do not support this field, `python` packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.

## Other sections

A number of alternatives were discussed and considered to address this problem. These include:

* Using symlinks or a `sitecustomize.py` file to allow linking into the incorrect site-packages directory.
This is a work-around for the problem, not a fix.
* Querying the interpreter to determine the site-packages directory.
This was rejected as it requires running the interpreter as part of the install step and requires that the `python` package be installed and operational before `noarch: python` packages can be linked.
* Using the build string of the `python` or `python_abi` package to determine the site-packages location.
This is an indirect method of obtaining information that can be more effectively specified explicitly.
* Storing this information in another metadata file, such as `about.json` or `paths.json`.
The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until the `python` package is downloaded and unpacked, potentially limiting the order in which packages can be linked.

For more alternatives and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit 982868f

Please sign in to comment.