-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for "soft" or suggested constraints #7051
Comments
I edited the title for you -- it's important to make clear you want pip style soft/suggested constraints (aka the No comments on your proposal yet as I don't have time to read it, but I think you should make sure to think about the design holistically, if you haven't already. How does this interact with Poetry as a PEP 517 backend (e.g. packages installed from a sdist or using Do we need a custom format? It this metadata that is part of the lock file, or part of the project file, or external (in which case a plugin probably does make more sense). Should we capture a remote file reference in the project format? The reason the old proposal was closed was mostly because it didn't answer any of those questions, and introduced what was essentially a parallel lock file without addressing the downsides of a partial solution. To have something usable, it needs to clearly benefit a large number of users without harming existing users or maintainers, and it needs to not have surprising sharp-edged interactions with the wider Python ecosystem/Poetry's interaction with said ecosystem over stable, defined interfaces (like PEP 517). |
Good point - thanks for clarifying the name, naming is a key here and indeed the more you use some names in one context, the less you realise it might be differently understood in other. Yes. I was actually thinking (eventually) about making it a more hollistic approach. Even in a form of PEP propoosal on how it could be made part of the PyPI and generic packaging format - so that you could publish such constraints as part of the publishing process in PyPI. However I think, before we approach a standardisation/pep we should have some validation if my concerns/problems we have in Airflow are also enough of a problems for others and whether we can get it validated with tools like poetry as well and then we could base standardisation efforts on experienced from those two tools - and also confirmed by other projects that might use the feature, Re: PEP 517 - think it is largely orthogonal to my proposl. I think of "soft constraints" more as an end-user feature, something that users of published packages should do, rather than developers. I do not necessarily see the need of such "soft" constraints for the development purpose other than what is already there (at least not now). Whatever Poetry and other tools (like pip-tools) provide with lock files solves some of the problems already well, you can also do some custom stuff with I believe this is entirely new PEP and would love te explore an option of leading it together with |
I may be off base here but I think the following would address the concerns and be a minimal change with significant value for airflow and other python libraries as well. Add a The difference between lock files & these This would allow end users to use their Example Workflow / Details & OverviewMy pyproject.toml
Commands
Results
Benefits
|
Yes. I think something like that would work @r-richmond. One of the ideas that I had here is that many users seem to be using poetry to not only "develop" their applications but also to describe and manage their installation environment. I think this is a very valid case:
I believef
Currently I think there is no such possibility (other than some manual manipulation) and that seems like pretty valid and useful workflow for those people who maintain Airflow installation (and similar cases). I find it really appealing as a case that we could collectively - few people involved, discussing it, possibly also we could ask @neersighted (and other poetry maintainers) what do you think about such case ? Is @r-richmond and my explanation somewhat appealing to your way of thinking? |
I ultimately think this does not belong in Poetry -- or rather, Poetry proper. I've done some thinking about design here, and I strongly object to the idea of adding a new artifact type (e.g. a URL associated with a package with a constraints.txt) or a command-line flag (too inflexible, doesn't allow merging), especially on the The lock file is (and for now, should remain; I don't think this is the issue to reconsider how we conceptualize/design it) a cached resolution; nothing more, nothing less. To begin to overload it as project metadata/not a semi-disposable/reproducible artifact is, I think, counterproductive. Off of the top of my head, I can think a more generic design that satisfies this ask while not conflicting with Poetry's existing design/overloading the meaning of the lock file:
Poetry has in general taken the stance that if you care about versions, they belong in the pyproject.toml/are now a top-level dependency. This would be congruent with that, and you could avoid exporting direct dependencies by putting them in a development group. I'm not married to this approach, but in general we want to keep project configuration in the pyproject.toml file, and keep a cached resolution in the lock file. Like I said above, I don't think overloading the meaning of the lock file is a productive direction, and would likely happen during a major bump, if it happened at all, as we'd need to significantly retool parts of the Poetry CLI to make the lock file less prone to unexpected changes. |
potiuk, Yes that is how I use it today. For reference: example of a CI/CD process utilizing Poetry / poetry.lockFiles tracked in git
Snippet of pyproject.toml
upgrade_packages.shpoetry lock && poetry export -f requirements.txt --output requirements.txt --without-hashes && poetry export --with dev -f requirements.txt --output requirements-dev.txt --without-hashes Snippet of Run inside Dockerfilepip install -U pip setuptools && \
pip install --no-cache-dir --no-deps -r /resources/requirements.txt
# note originally I used `poetry install` but `pip install` generated smaller image sizes :shrug: Summary
Summary Continued
How this feature would help
I think there is a middle ground here that is being missed in that users & python library maintainers care deeply about i.e. did you use known good versions? In fact in Poetry's own issue template you ask for the poetry version for triaging. The purpose of this (I think) is to help inform the maintainers wether or not the user is using a known good version. For poetry the buck stops there but for libraries that depend on other packages it gets more complicated (as we all know). Thoughts on import suggestion
This is an interesting directional pattern. However it makes the Closing thoughts
Full paragraph from docs that is relevant parts in boldCommitting this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)
|
Poetry's solver should come up with a working solution every time, assuming that the metadata it is fed (aka the metadata of packages in your dependency tree) is good and there are no regressions in the latest compatible version of a package. However, to promote reproducibility, the lock file exists and makes sure that runs of Poetry across platforms and versions use the same resolution every time. So far, so good. However, if we start leaking configuration into the lock file (e.g. "I want to use versions in this range" -- where did they come from? Why did they change?), the lock file goes from being "a resolution that we know works and can be re-used" to "a vital project configuration file that should not be perturbed." We in general try to avoid encouraging lock file ossification and encourage people to re-test updating their dependencies often. The lock file is intended to record a working solution, but not represent the only known-good way to install your project. Your proposal is a step in the wrong direction, in my opinion, for several reasons:
And besides overloading the model of the lock file being conceptually unpure, my additional objection is that support for a relatively minor user ask sets us down a entirely different direction for Poetry development. If the lock file is vital project configuration that cannot be damaged, as opposed to a cached resolution format that we have great flexibility with, development of improvements to Poetry gets harder and the maintenance burden increases. If the lock file is not semi-disposable but instead needs to be carefully tended to/managed/curated, we have to grow an entire set of (additional) tooling to curate it over time, and our surface area for user-impacting changes grows quite a bit. Currently we consider the lock file an implementation detail, and that has benefited Poetry quite a bit. |
I like the direciton of I would love to understand some in-s/out-s, and maybe I will try to explain where I am coming from in my words. Maybe I believe most of That's where local pyproject.toml file (describing your package) is. I believe pyproject.toml is all about describing dependencies of your own package you develop. For me I am not trying to improve live of someone who has their own python code and develops it and build their own packages, but someone who consistently (now and few months from now - in the CI environment etc.) wants to pull and install a number of dependencies that consists of the "virtual environment" and wants to have an easy path to upgrade those in the future by purely specifying "I want to know have airlfow 2.5.0 installed - plus those other packages as well, go figure which are the best versions. This is the problem I am after - purely user feature. I understand (and please correct me if I am wrong) @r-richmond you are one of the airflow users who uses (or maybe mis-uses - I do not know poetry that well - maybe this is not at all intended use of it) poetry to manage such "user installation". For me such a user is purely a consumer of packages, not a producer of those (and IMHO pyproject.toml is all about producing packages). None of the relevant PEPs- PEP 517 and PEP 518 nor PEP 621 nor even PEP 660. mention the "user" case - they are all about building packages. I am talking about purely installing packages from PyPI - without having to build those packages locally - using the wheel files (let's assume we have all the binary wheels needed). My "improvement" proposal is that such packages distributed in PyPI as wheel files could also have an optional meta-data (in PyPI) describing the "known good set of constraints for repeatable installation". Imagine "main" application package. And whenever user wants to get a consistent installation with those "known good dependencies" he might want to run (for example) So I am not sure why do we need to involve pyproject.toml in this case ? Is it just to get list of packages to install? Are we thinking of using it for something else (for me pyproject.toml is a development tool not something that users would like to keep just to tell "those are dependencies I want to install". Or maybe I am completely wrong about this? Maybe poetry simply should not be used in the case I am descrinbed? Maybe it does not have an ambition to be used for that case and we should simply use I would really love to know answer to that question, because I feel we are talking about somewhat different use-cases. |
correct
But it does represent
These were good comments. You've convinced me that adding an option to Note: this differs from your suggestion of using grouped dependencies since the packages in this section wouldn't be installed by default, they would only be installed if they were required by something in the dependencies and if installed the version would be constrained to a version(s) as specified in the corresponding constraint. Lastly, I don't want to distract anymore from |
@potiuk, if this was additional metadata in the core metadata baked into a distfile, I would have no objections. We could introduce a However, I was thinking more along the lines of this as an addition to Poetry without any changes to Python packaging/the core metadata specification itself -- in that case, we do have to figure out how to map this onto existing constructs/in a Poetry-specific way. In any case, I believe this belongs in pyproject.toml. My suggestion for an import-based workflow is because of the following:
To answer other questions about an import-based approach:
I guess my opinion is this: If we can come up with a way to standardize this for the ecosystem, we'd be happy to contribute to the PEP process and eventually implement it. But if we're doing this on a one-off bespoke basis for Poetry, the bar will be pretty high as this is a (relatively, I know OpenStack, Ansible, and Airflow are all major users) niche feature that has far-reaching implications. I also think that I/some of the other maintainers need to do a docs pass for design/philosophy as much of our thinking/perspective/the intended design of Poetry lives in our heads; hopefully however, the stance on the lock file is starting to make sense? That is to say, lock file is an implementation detail for speed and reproducibility and not a primary mechanism for specifying versions, unlike pip-tools, which is a tool to build frozen requirements. |
Very insighful, Thanks @neersighted . This is a cool discussion, I've learned a ton from it, I think @r-richmond's case is a bit different case that I had in mind). I think I started to see how pyproject.toml woudl fall into this picture. @r-richmond - do I understand correctly that you treat the
Yeah. I think that it is a small missing capability of PyPA that could make it into a PEP. And I do not think particularly that "constraints" format from Knowing how opinionated (in different directions at that) people who are part of the PyPA are, i think it might be hell of a journey to propose something, get it approved and eventually implemented. But I would love to try in the coming months. I have an upcoming talk ion Thursday at PyDataGlobal online https://global2022.pydata.org/cfp/talk/BPFCBT/ - "Managing Python Dependencies at Scale" - and I think I will try to end it up with a question - "Is it something we should try to make into PEP? Is it needed?". I hope i can get some feedback - and maybe I can involve some |
Yes, your summary Reading about some of neersighted's concerns about things I'm not using (i.e Background on how I ended up here
Questions / Comments that should perhaps move to a discussion
p.s. let me know if you think that question would be better in a discussion. |
This thread is pure gold for my talk :). Thanks @r-richmond for such detailed description and question :). I would love to hear if there is an intereste in supporting case like this - because my talk (and I hope maybe future PEP?) is very much about this case. |
This is a very interesting discussion, but I'm not sure I got all of it! I think the use-case that brought me here I think most closely fits the "deployment description" mentioned by @r-richmond. Curiously, it is also about Airflow :) In my case, I want to build DAGs that will be deployed into an Airflow instance that I don't fully control. I know exactly which package versions are available in that environment. I can't change any of the existing packages, but I can add new ones. So I want to use Ideally, the package list from the target environment should not live in my Does this scenario fit into this discussion? |
Very much so. |
Bump, this would be really nice. |
Really nice feature to have indeed! |
Bump! |
+1 but Is there a simple reproducible example workaround by using |
@potiuk, as of the latest Poetry version, it seems to support constraints in a form of an optional dependency group. You just don't install that optional dependency group. I experimented with it, and it seems indeed that such an optional group puts appropriate constraints on the poetry lock, while not being installed unless explicitly asked to. I also confirmed that when you install such a project with pip, such an optional dependency group does not get installed. Please let us know if you were looking for any different sort of functionality, otherwise, this issue can be closed! |
Also, a ping for all who needed this feature: @codecakes, @hstravis, @douglaszickuhr, @fjmacagno, @bjoernpollex-sc, @r-richmond. |
This is a different feature @sabiroid . What optional dependency group is, is corresponding to In constraints you specify the limits on all possible packages that you might install. for example in airlfow constraints is this https://github.com/apache/airflow/blob/constraints-2.6.2/constraints-3.10.txt What the constrainst are in this case they say: If - for whatever reason - you are installing dependency ABC, it should be X.Y.Z version. And it tells so for potential 650 dependencies of airflow (including transitive ones) The optional dependency group means something different. They mean that tf you choose (as user) to install this optional group, you install also those dependencies following those requirements. For example "pip install airflow[celery]" will install those dependencies https://github.com/apache/airflow/blob/main/setup.py#L268
This would be corresponding to Now, constraint are working as an extra "soft" limit. Following from the example above:
This builds on top of the optional feature of installing celery package, but one thing more it does it tells "and the celery package version installed should be EXACTLY 5.3.0 - because that's what you are constrained to by the constraints". And is the best way to get reproducible installs. It's really very similar to "lock file" of poetry but: a) it can be stored externally - not in the source code of the project |
After discussion in the #8251 feature request, it was deemed that my use case is apparently close enough to this to merge it together. While I do see similar concern, I'm not totally certain the solution for one would solve the solution of the other. (Even more looking at the latest comment from potiuk.) But in any case let me express my requirement and see what are your thoughts, hopefully we can move forward to a resolution for everyone. As a developer from a project, I need to align my dependencies on the whole company whitelisted dependencies. We also want, as a team that most of our project keep all dependencies version as close as possible to avoid incompatibility and feature gap between the various projects on which we are working. The needs for that are thus :
Now my proposal was quite aligned on the BOM pattern from java that is implemented in maven and gradle. Maybe a bit simplified thought and would use the optional dependency pattern to achieve it. I see 2 ways of doing this (all the detail in #8251, I dont want to add too much here):
I believe both of them answer all the need above, and are really not far from the current poetry feature, wouldn't require much (or any) change to the pyproject or cli. Hence optional dependencies of our project are already used to resolve version from transitive dependencies, and extra allow installing and thus resolving optional dependencies of a dependency. Also most importantly, that use case, imo, is clearly in scope of a "dependency management and packaging tool" So what are your thoughts, would this solve the issue requested here? Do you think it's a good approach to the exposed issue? |
FWIW, I worked on #4005 years ago adding this feature but it was not considered to solve a real problem, and I gave up. |
I would like to add my use case for In machine learning it is common to use pre-built docker images with some python packages already installed. And you really-really do not want to override these specific versions because they might contain some very specific hardware compatibility. For example We constraint all the |
Any update on this. Poetry works flawlessly for everything expect, fails for Airflow. We want to settle with Poetry for dependency management and packaging but it does not make sense to use pip and poetry both only because of lack of support for something like constraints in Poetry. |
Feature proposed: constraints
It would be great is Poetry supports the fantastic feature that
pip
has: constraints.Constraints are extremely useful for more complex applications that have many extras - thus optional dependencies and transitive dependencies as well. It is a great tool to provide reproducible installs of Python applications, without imposing strict pinning of dependencies and allowing the users of applications to manually upgrade and downgrade dependencies of the main "application" installed, even if they are relased after the main application has been released.
Short summary of how constraints work
When installing an application in
pip
user can specify--constraint
flag with specification of constraints to use (in the same form as requirements - local file, http URL etc.). The constraints specified this way should be "pinned" versions (i.e==VERSION
only") and they change package resolution in the way that the only the version specified for the package is considered during dependency resolution.Constraints are not "requirements" - if the user does not install specific requirement (for example because it is part of an optional extra), the package will not be installed even if it's specific version is specified in the constraint file. Also constraints are exclusively used to perform resolution when the "installation" process resolve packages, and they are immediately forgotten once this particular "install" command completes. This allows to manually upgrade any of the packages that were "pinned" by constraints as long as it is within "requirements" specified by other packages.
Why it is useful
It is useful to get reproducible installs of applications (not libraries) without limiting security upgrades (and non-security upgrades as well).
It allows for fully reproducible, yet secure "from the scratch" Python application installs (web apps, CLI apps - generally apps that are supposed to provide user-facing features rather than libraries for other apps) without pinning specific version of dependencies in "hard" way. Fully reproducible install means that no matter if you install it today or few years from now, the application should install correctly - no matter if direect or transitive dependencies released new versions.
Typically applications might pin their dependencies to specific version and this is how you typically approach "applications" (as opposed to libraries that typically have "open" dependencies) . If you want to have truly "reproducible" install, you need to pin all your dependencies this way (including transitive ones), because otherwise transitive dependencies might break your "from the scratch" install - impacting the "first contact" with your application.
However there is a drawback of that - because when you pin dependencies, user cannot - independently - upgrade any of the dependencies that are pinned - and if those dependencies release even a small security fix, the main applicaiton must be upgraded to take into account. This is a limitation of pinning. It means that user who wants to upgrade security fix must wait for main application to release new version. In the world where supply chain attacks are a thing, and where security becomes more and more important, giving the user option to upgrade independently dependencies after the fact of installing an application is crucial.
Constraints nicely allow to make "reproducible installs" while keeping the possibility of "security updates" for any dependencies.
Another consequences of using constraints is that it also allows the user of application to perform non-security updates for the dependencies, which is important in cases like Airflow, where Airlfow is not only application to run, but also is a platform which provides library for Python developers (in Airflow DAG Authors develop workflows as Python code and they often want to be able to upgrade libraries installed by Airflow).
Lack of support for constraints is the reason why Airlfow discourages usage of poetry (even though we woudl love to be able to get their users to use poetry).
Example
Apache Airlfow is heavily depending on constraints - they developed a mechanism to automatically upgrade their constraint files based on result of automated tests and the only "recommended" way of installing airflow is via
pip
with constraints. Lack of constraint support is the only reason whypoetry
is discouraged: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.htmlThe constraints of airflow are maintained automatically, tagged together with each version of Airflow - main version here: https://github.com/apache/airflow/tree/constraints-main.
Yes, Airlfow is a bit special case with almost 700 dependencies and ~100 extras which is a bit extreme, but many otther applications could benefit from that approach. Constraint mechanism in Airlfow is used for more than 3 years and it helped Airflow maintainers in numerous cases where 3rd-party dependencies would have broken Airlfow "clean install" for already relased historical packages, while allowing their users to upgrade many dependencies as needed. More elaboration of why it is needed and what problems it solved for Airflow are explained in this talk from PyWaw # 98 "Managing Python dependencies at scale" https://www.youtube.com/watch?v=mlOkkTuucSk
Alternatives
As @dimbleby mentioned in #7047 (comment) the #3225 closing reason was that it is the same as lock file.
While lock files are "almost" constraints, there is one difference, Lock files are development feature for someone who develops the application and once the package is uploaded to PyPI, the lock file remains only in the source code of the application. On the other hand, constraints are user facing feature. Users should be able to install an application from
pypi
(or another compliant repository) and apply constraints (for example taken from a published .lock file or file following requirements.txt format) as a single installation command similar to what Airlfow installation does withpip
:The above is just an example, there might be other conventions used by other applications.
The convention should allow for different set constraints per Python MAJOR.MINOR verison (necessarily for big applications, set of dependencies slightly differ between Python 3.7, 3.8, etc. It would also be great to add other options - such as architecture (ARM/x86) but this is not as important as Python versions. This could be done per convention of the location of the file, or in case of poetry.lock it might be defined using poetry.lock features.
It should be possible to use lock files (and ideally also format compatible with requirements.txt) as "constraints" while installing a package by the user from PyPI, without having sources, pyproject.toml nor without having to copy poetry.lock manually to the local folder. In this sense, poetry.lock is not far from constraints, what is lacking is support for single-line installation, where constraints are specified as a remote URL to pull automatically and use durig installation by the end user from PyPI or another registry. In fact using and publishing poetry.lock as the "constraint" file to be used could be one of the main use-cases for poetry-managed applications.
Ideal properties of the constraints feature
poetry install NNN --constraint http://.....
to apply constraints remotelypip
If there is a consensus among poetry maintainers that this is a worthy feature, I am happy to help in both design and implementation of this. I have vast experience in managing dependencies in Airlfow with using constraints (I am the original author of the approach Airlfow uses and I evolved and maintained it over last 3 years or so).
Actually there was a duplicate #3225 but it has been closed, but in the dicusson in #7047 I decided to open it againt.
The text was updated successfully, but these errors were encountered: