Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install without resolution on packages with pre-locked requirements #11528

Closed
1 task done
lecardozo opened this issue Oct 18, 2022 · 10 comments
Closed
1 task done

Install without resolution on packages with pre-locked requirements #11528

lecardozo opened this issue Oct 18, 2022 · 10 comments
Labels
resolution: no action When the resolution is to not do anything type: feature request Request for a new feature

Comments

@lecardozo
Copy link

What's the problem this feature will solve?

We are using a "lockfile-based" package manager — PDM, Poetry, etc — to generate wheels containing pre-locked dependencies as Requires-Dist attributes on the package METADATA. These wheels are not libraries to be distributed, but applications that describe the specific (pinned) requirements for it to run successfully.

As the dependency resolution process already happened at development time, it would be nice to avoid having to resolve it again at installation time. Besides making the installation faster, it should also naturally ignore conflicts that were explicitly "solved"/overridden by the developer.

Describe the solution you'd like

Pip could have a flag for turning off the resolution process and allowing the user to install pre-resolved dependencies. Something like a pip install --resolved/--no-resolve package-0.0.1-py3-none-any.whl

Alternative Solutions

I could not find a workaround for that using pip or other tools. Initially thought the --no-deps flag could solve this, but it ignores direct dependencies as well, which is not the case.

Additional context

The idea of generating wheels with the whole resolved dependency tree has been previously discussed in different contexts:

Code of Conduct

@lecardozo lecardozo added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels Oct 18, 2022
@lecardozo lecardozo changed the title Install without resolution on packages pre-locked requirements Install without resolution on packages with pre-locked requirements Oct 18, 2022
@pradyunsg
Copy link
Member

pradyunsg commented Oct 19, 2022

The supported workflow for that is to use a requirements.txt file containing the entire dependency graph and install that with --no-deps.

Please see https://caremad.io/posts/2013/07/setup-vs-requirement/ for why having requires-dist as the locked requirements isn't a workflow that's directly supported.

@pradyunsg pradyunsg added resolution: no action When the resolution is to not do anything and removed S: needs triage Issues/PRs that need to be triaged labels Oct 19, 2022
@pradyunsg
Copy link
Member

pradyunsg commented Oct 19, 2022

We don't have a label for "by design", so I've used the no action one; since I don't think that the feature requested here is appropriate for the dependency management model that we've been working toward both standardising and publicising for quite some time.

@lecardozo
Copy link
Author

Thanks for the reference, @pradyunsg! 😃

I fully get the idea of "concrete" (requirements.txt) vs. "abstract" (Requires-Dist) requirements (using the terms from the reference, for consistency). That said, it seems to me that, in some ways, this distinction is a little blurry. For instance, although Requires-Dist only defines abstract package names and versions (PEP-345), pip already offers mechanisms such as --index-url/--extra-index-url that allow it to look for a specific concrete distribution — package==version on specific index X.

I feel that being able to define a set of pinned dependencies on a single tarball/distribution, without requiring the existence of an additional requirements.txt file improves the overall experience for end users.

I'm thinking if we could somehow improve the experience based on the currently supported workflow for these cases (requirements.txt + --no-deps). As we currently already have ways to package static assets with sdists/bdists, would it be feasible to package a requirements.txt together with the dist and, somehow, tell pip to look into that for dependencies instead of looking for PKG-INFO or METADATA? 🤔

@pfmoore
Copy link
Member

pfmoore commented Oct 19, 2022

IMO, this is out of scope for pip. Installing applications has a significant level of additional complexity over installing a library, and installing the right runtime dependencies is only part of the problem. The existing requirements.txt + --no-deps approach1 is sufficient for pip to be used as a component of a workflow for building and distributing applications, but pip isn't itself an application manager.

Footnotes

  1. And any future standardised lockfile, when that gets agreed.

@lecardozo
Copy link
Author

Thanks for the input, @pfmoore 😃

Installing applications has a significant level of additional complexity over installing a library, and installing the right runtime dependencies is only part of the problem

I feel that installing applications might have different levels of complexity.

  • An application could rely solely on a set of fixed Python dependencies and be installed as a simple package (thinking on things that can be executed on pipx, for example). Here, pip's existing machinery should be enough to handle that, as installing pinned runtime dependencies should be everything that is needed for these kinds of applications.
  • An application could require extra system-level dependencies. Here pip can be seen as a component in a bigger workflow, for instance being used as a step for building container images. This is clearly out of pip's scope.

@pfmoore
Copy link
Member

pfmoore commented Oct 19, 2022

thinking on things that can be executed on pipx, for example

In that case, I consider pipx to be the package manager, and pip to be a low-level component (along with venv) that pipx uses to do its job.

@lecardozo
Copy link
Author

I see now. Sorry if I misunderstood the role of pip in the ecosystem.

I think the biggest advantage of having such functionality baked into pip would be the easier adoption and the reuse of existing abstractions for installing packages.

For those who need a non-generic workaround for this, here is a snippet I'm using to get around that. 👇

import os
import sys
import subprocess
from installer.sources import WheelFile
from contextlib import contextmanager

def get_requires_dist(wheel_file):
    with WheelFile.open(wheel_file) as file:
        metadata = file.read_dist_info("METADATA").split("\n")
    pattern = "Requires-Dist: "
    for l in metadata:
        if l.startswith(pattern):
            yield l.replace(pattern, "")

@contextmanager
def write_requirements(requirements):
    requirements_file = ".reqs"
    with open(requirements_file, "w") as f:
        for req in requirements:
            f.write(f"{req}\n")

    yield requirements_file
    os.remove(requirements_file)

def install_locked_requirements(requirements, wheel_file):
    with write_requirements(requirements) as requirements_file:
        subprocess.call([
            "pip", "install", "-r", requirements_file,
            "--no-deps", wheel_file
        ])

def install_locked(wheel_file):
    requirements = get_requires_dist(wheel_file)
    install_locked_requirements(requirements, wheel_file)

if __name__ == "__main__":
    install_locked(sys.argv[1])

This could be executed with python install_locked.py package-with-resolved-dependencies-0.0.1-py3-none-any.whl

@pradyunsg
Copy link
Member

pradyunsg commented Oct 20, 2022

Closing this out for now, since we seem to have reached agreement that... well... it'd be nice to have in pip but it's incompatible with the model that pip pursues as well as the position it has within the ecology today.

@potiuk
Copy link
Contributor

potiuk commented Oct 21, 2022

@lecardozo - comment from my experiences (and Apache Airlfow's).

I perfectly understand your need - we've been thinking long time ago how to solve it in Apache Airrlfow without any opinionated approach and changin pip maintainers position on that. And what we came up with was somthing that does not require pip to change and actually even acknowledges pip's role as low-level component (we used constraint feature of pip as a way to build complete solution).

While it is not "self-contained" in the wheel file, pip - via constraint mechanisms - allows you to specify constraints file that you can use during installation (optionally). As application maintainer you can - if you really want - prepare such a constraint file for each version of your package and inform your user about them. This way you can have "golden" (aka "blessed") set of dependencies covering whole dependency tree stored in a form of constraint file.

See https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html for instructions that we give our users. Example from that doc:

pip install "apache-airflow[celery]==2.4.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.1/constraints-3.7.txt"

The set of dependencies in the constraint file is not embedded in the .whl metadata - it is external and we chose to host the constraint files in orphan branches of our repository (one branch for each minor version of Airlfow). We fully automated preparation and tagging of such constraint files in our CI, so for example whenever we release a new version it has automatically latest versions of "valid" dependencies (because we automatically upgrade and test all dependencies using eager-upgrade feature of pip whenever all tests pass).

This approach has a number of advantages:

  1. It does not require the user to run any python "extra" code - while your solution is simple, it requires extra python code to install and extract meta-data and run pip underneath. The solution of ours taps into existing 'constraint' feature and allows to host the constraint files remotely - and does not require to execute any "extra" code - just proper pip command that can be easily generated in CI for example.

  2. We can all but assure that older version of application can be installed without being afraid that a 3rd-party dependency release breaks it (which happened multiple times before we introduced it).

  3. The fact that it is not part of the .whl file is actually pretty useful. We can remotely change set of "golden" version in case of breaking changes in pip orsetuptools. - for example it happened that one of our dependencies (flask-openid) stopped being installable with newer setup tools (because it used dropped deprecate 2to3 flag) - see details in
    Airflow installation fails when with latest setuptools apache/airflow#18075 (comment). This way as a maintainer you have a way to help your users following recommended installation process - after a fix they will continue installing released application version even in case of such catastrophically breaking events.

  4. User is not limited to those versions of deps. Those are "golden" (i.e. guaranteed that they passed all tests) set of dependencies, but if the user wants to upgrade any dependency and it is not limited by "install_requires" they can upgrade or downgrade them individually after installation as they see fit.

You can watch my presentation explaining why and how we've done that: https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s - there are lot more details in it and a lot more context.

Of course, it does not help with making such approach "popular" and "reusable".

This is rather complex solution, specific to Airflow (we have more than 650 dependencies in our dependency tree and airflow is both an application and library, so opinionated solutions like poetry are not good enough for that). We had to educate our users a lot until this became a "common" knowledge, but it provides you as maintaier possibly to give an easy answer in case some dependency breaks your package: "please follow the only supported mechanism we have - use pip and constraints (see link to the docs)".

Maybe some day some of our experiences from Airflow can be useful when preparing PEP that could solve the "application/library" condundrum better and allows to define "golden set" of dependencies for each package by maintainers (and later such PEP could be turned into a pip feature). I would be happy to be part of such effort if there are more people willing to collaborate on that. I am quite sure what we have in Airflow is not good for a "reusable" way - it's far too complex, but I think some of experiences we had could be reused and if anything should be done about it, then it should start with a proper PEP proposal, discussion, approval.

Seems that while it is not what currently pypa and pip maintainers are worried about too much. From past experience, I believe pypa approach and pip maintainers are rather strongly opinionated and it will IMHO require quite a lot of effort and energy to bring any proposals before they see the need for it, so I am mostly now watching and commenting rather than than trying to actively propose changes. While I know what it means to be persistent, I feel this needs at least positive acknowledgment from the pypa group before any time investment from anyone.

However I think it is a real problem that needs some love in the future, I hope to help when the time is ripe and I see that there is a good time for such proposals and that I would not have to waste enormous energy an time for something that has no support nor chances or succeeding.

@lecardozo
Copy link
Author

Thanks a lot for sharing your experiences with Airflow, @potiuk! 😃 The approach you're currently using seems to be the best to keep compatibility with pip's features without relying on external code. I'll try to adapt my use case for something similar with a static constraints.txt.

You can watch my presentation explaining why and how we've done that: https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s - there are lot more details in it and a lot more context.

Thanks for the reference. Great talk, BTW!

Maybe some day some of our experiences from Airflow can be useful when preparing PEP that could solve the "application/library" condundrum better and allows to define "golden set" of dependencies for each package by maintainers (and later such PEP could be turned into a pip feature)

That would be pretty nice! I think Airflow is one of the greatest examples that I know of — being a user of it myself — that has to deal with this app/library duality. Lessons learned by you and your team will definitely be useful if we were to ever define better standards for such cases.


extra: one thing that came to my mind after reading about your experiences: how do you deal with conflicts? With 550+ dependencies on the constraints list and a messy ecosystem, I'd imagine that you probably would find yourself dealing with a bunch of those.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolution: no action When the resolution is to not do anything type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants