- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreter constraints do not play well with lock files. #12200
Comments
@Eric-Arellano this struck me and it seemed good to write it down. You may or may not want to include this in your design doc, but it clearly seems a problem to contend with in some way. Whether that means outlawaing constraints being used with lockfiles or ... I'm not sure. |
Hm, yea... this is an interesting case. Working in our favor is the possibility that users might view this as "not having the 'right'/'consistent' interpreter" for their lockfile, rather than the other way around? |
I'm not so sure. It was the user after all that wrote down the "CPython>=3.6,<4" interpreter constraints. Presumably that meant they considered CPython 3.6 through CPython 3.9 were all the right interpreters. Assuming they meant what they said I'd guess they have the opposite reaction: I told Pants this should work for "CPython>=3.6,<4" and yet, when I asked Pants to generate the lockfile it did 1/4 of the job without warning or failure. So I checked in the lockfile assuming Pants did the whole job at the time and got burned 2 months later. (Here 1/4 is generous since 3.6, 3.7, 3.8, 3.9 each subdivide further per the patch version (environment marker Afaict this is not an easy problem unless maybe we can side-step it in some creative (???) or brute way (disallow ICs + lockfile combo). It really seems like the Python ecosystem has shot itself in the foot with environment markers being overly broad. There really is no way to generate a lockfile usable with anything other than 1 exact interpreter. I think in practice you mostly can, but you definitely cannot on paper unless you do the asterisk thing above and perform an over-resolve. |
I talked more with @Eric-Arellano about this yesterday. My conclusion at least was that Pants should effectively treat a range as a declaration by the user that all of the versions that can be matched by that range are "effectively equivalent" for them. Yes, it's been very common for people to do things like Instead, if folks are trying to test/package with multiple versions that they know aren't equivalent, they'll need to create multiple binaries/tests (optionally via a macro, e.g. https://blog.pantsbuild.org/python-3-migrations#running-python-2-and-python-3-in-parallel). So: I think that this is fine. Perhaps we can do something to detect and warn about overly wide ranges, but it doesn't seem like we need to execute multiple resolves per range. |
That makes sense I think for all? uses of ICs except lock files. A lock file says we have a consistent resolve independent of the time we go to resolve things or the (valid) interpreter we're using at the moment. If that lock file is only written for one minor version in a range that covers >1 minor version, well, we've failed the whole lock file concept and not in fact locked things. It seems to me the only right thing to do with an IC range that covers more than one minor version is to fail to generate the lock file on any machine that doesn't cover the range. If you want to lock things for a project, you had better have a dev env setup to cover the project requirements. That seems totally reasonable to me and just means we'd need a few things:
|
To be clear: generating the lockfile is one thing: consuming it is another. If you generate the lockfile on one minor Python version with a reasonably tight range, there is a very high probability (implied by "reasonably tight range") that that lockfile will work fine when consumed by any other Python within the range. And that's sufficient to provide you the safety and stability guarantees of a lockfile, afaict? Critically, the file should not be re-generated if you change Pythons within the range (this is the reason for us to treat the range as a declaration that the Pythons in that range are equivalent). So it will only be regenerated if the input requirements/platforms/ranges change.
Given the above, I do not think that we need to "cover the range". We can lean on the declaration of equivalency. |
Using #12312 for illustration on Pants itself, I want to highlight what your declaration of equivalency actually means in practice for a - I assume we agree - "reasonable" IC range - Pants' itself:
Afaict this is no bueno. Not from a locked resolve standpoint (I don't want dep drift to shoot me in the foot tomorrow) and certinly not from a security (supply chain) standpoint. |
To summarize this a bit more, if a Pants dev on a modern machine (has Python 3.8 or 3.9 1st on the PATH or as the only python3) generates a lock file for Pants, that lockfile will be missing entries for importlib-metadata and zipp and so both of those are subject to possible float or hijack until sometime later someone with a 3.7 generates the lockfile. This is why it seems to me we only have 2 options:
|
I feel like I've repeated this argument several times; so I'm not sure what I'm missing or how to resolve. The summary is that python dist resolution allows ~arbitrary interpreter specific dep selection and this necessarily means a resolve for one interpreter is never guaranteed to be correct for another interpreter (unless you can post inspect any environment markers actually used). That fact combined with the two roles of a lockfile afaict leads to all the rest. Perhaps we disagree on the role of a lockfile. |
This is a really good example, and helps to prove your point... I am definitely surprised to see shifts in resolves within that range. So maybe it is the case that we need to cover the range in terms of major versions, and have a lockfile per major version. I'm definitely sketched out by this from a performance perspective. Regardless of whether you take the "declaration of equivalency" approach or the "cover the range" approach, it still seems like everything points to encouraging users to narrow their ranges significantly. Maybe the "cover the range" approach is safer, because rather than warning for the purposes of safety, we can warn for the purposes of performance ("[warn] Hey, did you know that |
An excellent example indeed. #12316 fails because I generated the lockfile with Python 3.9 so I was intentionally taking a brief punt on this issue so that I could land some of the less controversial stuff and get more inspiration. I suspect we'll need to solve this sooner. |
You shouldn't be. Pex already resolves for every discovered interpreter on a machine that fits an IC range today (in parallel). We should be able to hit exactly that perf profile here too.
Maybe, but I think there is no way to avoid the fact that there simply will be use cases that use large ranges. We cannot tell those folks "don't do that".
I really think we should do post-resolve processing of dist-info/METADATA and use Requires-Python and Requires-Dist metadata to exactly determine the breadth of validity of a lock. I derailed things a bit with the "cover the range" terminology. Its actually about covering the environment range as selected by environment markers. Its just that the most common environment marker to use only picks out python minor versions (
Yes. Alot of design effort has been focused on UX - there are actual thorny fundamental does it even work issues to sort out though before even getting to that. A swing in focus is needed to make sure we ship something that at base works. |
Not quite: pants/src/python/pants/backend/python/util_rules/pex.py Lines 125 to 128 in b82a01e
Agreed on the rest. |
@jsirois are you envisioning that the end result is a single lockfile? Consider
1 has a complication that Pants then needs to know how to consume the distinct lockfiles, like when to load Py37 vs Py38 lockfile for this loose range. For 2, iirc, you mentioned we could teach Pex and/or pip-compile to generate the lockfile against multiple interpreters in a single run? It could add any necessary env markers. |
My experiment over in Pex uses 1 json file with a nested object for the original requirements strings and then a nested object for each interpreter / platform containing the interpreter / platforms lock info. Pex can consume that mono-lockfile and pick out the relevant interpreter / platform section when reading the lockfile to do a resolve later. So ... I think that's just a different format for 1. Instead of 3 distinct files, 1 file containing 3 distinct sections. |
This example I have laying around only has 1 lock, but I think you get the idea: $ cat lock.json | jq .
{
"pex_version": "2.1.42",
"requirements": [
"requests"
],
"resolves": {
"manylinux_2_33_x86_64-cp-39-cp39": [
{
"name": "certifi",
"sha256": "50b1e4f8446b06f41be7dd6338db18e0990601dce795c2b1686458aa7e8fa7d8",
"source_artifact": null,
"version": "2021.5.30"
},
{
"name": "chardet",
"sha256": "f864054d66fd9118f2e67044ac8981a54775ec5b67aed0441892edb553d21da5",
"source_artifact": null,
"version": "4.0.0"
},
{
"name": "idna",
"sha256": "b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0",
"source_artifact": null,
"version": "2.10"
},
{
"name": "requests",
"sha256": "c210084e36a42ae6b9219e00e48287def368a26d03a048ddad7bfee44f75871e",
"source_artifact": null,
"version": "2.25.1"
},
{
"name": "urllib3",
"sha256": "753a0374df26658f99d826cfe40394a686d05985786d946fbe4165b5148f5a7c",
"source_artifact": null,
"version": "1.26.5"
}
]
}
} |
@jsirois If we were to go with 3 distinct requirements-txt style lockfiles, how feasible do you think it'd be to teach Pex when to use what? For example, if we were to use pip-tools suggested naming format:
Could we have Pants feed all relevant constraints to Pex, and then Pex chooses which to use from the set? Rather than Pants having to determine which to use. -- I'm trying to scope things to determine if it's feasible to stick with pip-compile or if we need to start using Pex's proposed proprietary format. |
Have you figured out how to post-process a resolve's dist-info/METADATA outside Pex / using pip-compile yet? It seems that nugget comes before formats. |
…Lambdex, and Protobuf MyPy (#12357) These tools are simple to support: they run entirely independently of user code, and they get their interpreter constraints from a subsystem rather than from user code. *Black is temporarily not used because it does not play nicely with interpreter constraints: #12200. ### Tools that still need lockfiles Need to consider user interpreter constraints: * Bandit * Flake8 * setuptools * pip-tools Need to be mixed into user lockfiles (or have shading): * MyPy * Pylint * Pytest * IPython(?)
…ons` (#12371) To robustly handle lockfiles and interpreter constraints (#12200), we should generate a lockfile for each major/minor Python version. Regardless of the lockfile format we choose from #12362, this will be useful to do. This new helper function allows us to convert a range like `>=3.6,<3.9` into a set of ICs like `[==3.6.*", "==3.7.*", "==3.8.*"]`. We can then iterate over each element and generate a lockfile for those ICs. This code is careful to handle patch versions correctly. Many users have reported setting things like `!=3.6.3`, e.g. to avoid system Python, and we need to preserve that. [ci skip-rust]
Poetry robustly handles environment markers, including for interpreter constraints and platforms: #12470 (comment) If we end up using Poetry for generation, I think this issue is moot. |
…-compile (#12549) **Disclaimer**: This is not a formal commitment to Poetry, as we still need a more rigorous assessment it can handle everything we need. Instead, this is an incremental improvement in that Poetry handles things much better than pip-compile. It gets us closer to the final result we want, and makes it much more ergonomic to use the experimental feature—like `generate_all_lockfiles.sh` now not needing any manual editing. But we may decide to switch from Poetry to something like pdb or Pex. -- See #12470 for why we are not using pip-compile. One of the major motivations is that Poetry generates lockfiles compatible with all requested Python interpreter versions, along with Linux, macOS, and Windows. Meaning, you no longer need to generate the lockfile in each requested environment and manually merge like we used to. This solves #12200 and obviates the need for #12463. -- This PR adds only basic initial support. If we do decide to stick with Poetry, some of the remaining TODOs: - Handle PEP 440-style requirements. - Hook up to `[python-setup]` and `[python-repos]` options. - Hook up to caching. - Support `--platform` via post-processing `poetry.lock`: #12557 - Possibly remove un-used deps/hashes to reduce supply chain attack risk: #12458 -- Poetry is more rigorous than pip-compile in ensuring that all interpreter constraints are valid, which prompted needing to tweak a few of our tools' constraints.
Say I have interpreter constraints "CPython>=3.6,<4" in-play and I want to generate a lockfile for a resolve. There needs to be a lockfile per-interpreter these constraints select since requirements can have environment markers and these can cause an interpreter-specific resolve. Since environment markers are so broad in scope, patch version of interpreters can have different resolves and even the same patch version and same platform can have a different resolve due to fields like
platform_version
andplatform_release
: https://www.python.org/dev/peps/pep-0508/#environment-markersPut more simply, clearly if I have a concrete interpreter in-hand I can run a resolve for it and then generate a lock file for it. If I don't, I can't *.
With interpreter constraints I may only have a subset of possible interpreters though and so I can only generate a subset of the needed lockfiles. In the leading example, say I try to create the lockfile on a machine with just CPython 3.6.5 on June 13th 2021. I will generate a lockfile for that interpreter and check it in. Now, say, 2 months later I go back to that commit to re-build things, but on a machine with only CPython 3.9.1. I have no lockfile, and so it will need to be regenerated. The problem here is that the lockfile may not be close at all to the one generated 2 months ago. Many new versions of distributions may have been published and this can result in the new CPython 3.9.1 lockfile behaving quite differently than the CPython 3.6.5 lock file. Towards the worst end of this, the new behavior could be buggy or broken.
The text was updated successfully, but these errors were encountered: