Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build to specify modules that make it into the final *.whl/package. #7489

Closed
2 tasks done
pm5k opened this issue Feb 8, 2023 · 23 comments
Closed
2 tasks done
Labels
kind/feature Feature requests/implementations

Comments

@pm5k
Copy link

pm5k commented Feb 8, 2023

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the FAQ and general documentation and believe that my question is not already covered.

Feature Request

I would like to see if there's enough people who would benefit (as I think I would) from a new pair of flags in the build directive to allow users to specify one or more modules to be included / excluded during the build step dynamically and separately from the pyproject.toml definitions.

Like so: poetry build --with <package_1> <package_2> --without <package_3>

Rationale

In monorepositories where one could have contextually related micro-applications (or microservices), it is currently not possible to build a distribution package without specifying what modules are part of this service dynamically.

Let's imagine we have a project that is laid out like so:

repository_root/
    ...
    foo_bar/
        ...
        __init__.py
        __main__.py
    foo_baz/
        ...
        __init__.py
        __main__.py
    foo_shared/
        ...
        __init__.py
    ...
    pyproject.toml
    Dockerfile.bar
    Dockerfile.baz

Our pyproject.toml has this:

[tool.poetry]
name = "foo"
version = "0.1.0"
...
packages = [
    { include = "foo_bar" },
    { include = "foo_baz" },
    { include = "foo_shared" }
]

foo is the global context, our main project name. bar and baz are component modules in this context. They are meant to become (let's pretend) lambda functions or be dockerised and run by themselves. They also need a shared library which is specific only to them and no other project in the company would ever wish to use this shared library.

Now, it stands to reason that I do not want to create three different repos for this. It also feels unreasonable to make foo_bar, foo_baz and foo_shared into their own Poetry projects, as each top-level project is not a valid Python module and therefore I could no longer do from foo_shared import config from within foo_bar as an example.

Additionally, for local development, I might want to install ALL those projects without special config, so having them all listed statically as above is very useful, however - here's where the issue is..

When I dockerise foo_bar - I make use of poetry in the build step of the docker image. Something like:

# Dockerfile.bar
FROM base as builder

ENV POETRY_VERSION 1.3.0
WORKDIR /builder

COPY pyproject.toml poetry.lock ./

RUN pip install --no-cache-dir "poetry==$POETRY_VERSION" \
    && python -m venv /venv \
    && poetry export --with my_dep_group -f requirements.txt | /venv/bin/pip install --no-cache-dir -r /dev/stdin

COPY . /builder

RUN poetry build && /venv/bin/pip install --no-cache-dir dist/*.whl

This ensures that in my final layer, I can simply transfer the /venv folder with the installed project and run it like - python -m my_project as an ENTRYPOINT.

And this works well, except that in the above example, Poetry builds a single wheel and the installed project would allow one to run:

  • python -m foo_bar
  • python -m foo_baz
  • python -m foo_shared (Although this would yell as shared is a lib not an "executable" module..)

Using this sort of setup I am able to have a single CI pipeline build all my images from a single repository and distribute them further down the chain as needed, without the need for splitting the repo up or using multiple Poetry projects.

However if build had the aforementioned flags, the command could then willingly ignore foo_baz during the build stage and the resulting wheel would only carry foo_bar and foo_shared which is all that is needed for foo_bar to work in the docker container.
It would only take this command poetry build --with foo_bar foo_shared and that's it. It would explicitly only build a wheel including those two folders.

Paired together with the already existing optional dependency groups via --with and --without, this could add to the flexibility of Poetry and allow people to have more control over how projects are built in order to suit their preferred setup.

Caveats

I am unsure how deps would work in this context, I would assume that it would be enough to first install the project using a set of dependency groups to generate a correct lockfile which then in return would tell build that those two out of three project packages it is building into the wheel need only those deps?

@pm5k pm5k added kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Feb 8, 2023
@dimbleby
Copy link
Contributor

dimbleby commented Feb 8, 2023

This would be quite far from the norms of building wheels in the python ecosystem - and it sounds like a lot of work to implement too. I wouldn't get your hopes up!

@neersighted
Copy link
Member

Requests to dynamically vary the contents of wheels are increasing, but I'm not sure they're useful/will be accepted at this time. The convention is that python -m build produces the same result as poetry build and that the sdist results in the 'same' wheel when built on the same system every time.

If you want to mutate your wheels after Poetry builds them, they're just zip files and you're free to do what you want to them. But I don't think it makes sense to support custom behavior that other PEP 517 frontends will never be able to replicate.

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

@neersighted as far as I know, other build front ends don't also utilise optional dependency groups and yet these are quite well supported by Poetry. What harm do two extra optional flags add at this stage where the default behaviour does not change, but additional behaviour increases flexibility?

@neersighted
Copy link
Member

Optional dependency groups do not make it into the release artifacts. In fact, while I cannot find it now, there has been a request for a poetry build --with that has not picked up any steam for the same reasons.

poetry build up and to this point has been a standard build tool, and there has not been any general acceptance of the idea of shifting it away from being so. Keep in mind that Poetry tries to product standard artifacts compatible with other tools (e.g. so our sdists build the same wheels build by poetry build).

I don't think introducing complications to the poetry build command will ultimately be accepted in upstream Poetry, but if you have a design proposal that you think can answer maintainer objections, it certainly hasn't been a hard rejection up to this point. Nobody has clearly articulated what problems they are solving that couldn't be solved through other means, and how they will avoid introducing complications/confusion for users who don't need to solve niche problems, however.

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

I mean... it's a pretty clear problem I listed above. In a mono repo scenario where you aren't building one package but several, you don't want poetry to "build" every folder. Screwing around with a built wheel after the fact to strip unnecessary sibling packages is an insane "solution" when the package manager should really be allowing to exclude those from the build altogether. It's common sense and is a real world use-case. Sounds to me like the objections come from people not having enough exposure to a variety of different ways of using package managers in the real world and trying to stick to a given path rather than exploring something more flexible for the future, but if you can point me to a way to raise a proper proposal I would be interested in further discussions.

@neersighted
Copy link
Member

Poetry reasons in terms of packages, not modules. It sounds like you really want #2270, but are approaching the problem from the wrong direction.

Asking Poetry to build variants of a package based on arbitrary build-time flags with no other way to differentiate the artifacts is asking for trouble, and the problem you are trying to solve is better solved by methods that could be fully compatible with other Python packaging tools and not require special knowledge of Poetry.

Solving this with subprojects would result in multiple package names, but in most people's evaluation, that is an important benefit. Having multiple builds of your package with different contents and no way to tell them apart without opening them and inspecting them is considered defective packaging by most.

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

If subprojects has traction and is on the way to implementation - it could be a reasonable alternative to this, ASSUMING this results in discrete wheels being built and not just one megewheel (which is what I'm trying to avoid).

However, that's also contingent on not having a pyproject file for every "subproject" or some other crazy thing like that.

The reality of the matter is that not every Python project ends up in a typical distributed wheel as designed way-back-when when working in non-oos/non-community space and Poetry is a widely used package manager in companies running proprietary Python packages all over. Those situations mean unconventional use of the build / dist processes that may not align with the original "vision", but are not only entirely valid - they're also pretty commonplace.

I would strongly suggest a internal discussion around added flexibility as like I said before -- opt-in features do not break existing functionality nor do they compel the community to switch to other ways of working. Growing a feature set can be done without cardinal sin.

@neersighted
Copy link
Member

neersighted commented Feb 8, 2023

We have so far extended opt-in/Poetry-only features only where the poetry tool itself is in play. Violating the 'ecosystem-compatible' boundary has not yet been done (every artifact we produce is compatible with other tools), and this is an odd place to start breaking down that firewall, if we'd ever desire to do it.

Is there a reason that you cannot poetry install or poetry bundle or poetry export with the dependencies you want or do not want to include using the dev groups feature?

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

Can't speak to the first paragraph, but as for the second:

poetry install

This installs a project fully into an env. Inside of a docker build process - this is not desired.

poetry bundle

A cursory search tells me this is a plug-in which bundles the existing venv - again, not desired.

poetry export

This just exports deps.. I don't have a problem with this at all, you can see in my original post that I already use this.


I think you aren't understanding what I am trying to achieve here. I would urge you re-read my original issue text, but for clarity and on the assumption I did a really poor job at describing the problem (it happens):

I have a single monorepo, it has a shared library and three distinct "packages" which all utilise the shared library. These packages DO NOT make sense or weigh enough to be their own repos or Poetry projects. I ALREADY manage their dependencies via optional dependency groups - [tool.poetry.group.proj_a.dependencies] ... etc.

The problem arises when I am trying to Dockerise these. My docker process firstly takes the pyproject.toml and exports ONLY the main and project deps groups into requirements.txt and installs those into a separate /venv.

THEN, the second layer of the build process actually builds the Poetry project as a wheel package, which then gets installed via PIP into the separated /venv folder. This results in my project becoming a read-only dependency installed into the virtual environment.

I will then move that /venv into the FINAL layer of my docker build step where I will run ONLY the project_a dependency via say.. python -m project_a and expect that:

1 - Only project_a and project_shared packages are installed into the /venv.
2 - There's no source code littering my container, it's all contained within site-packages as it should be.

The problem is - poetry build results in a wheel package which carries all of my packages instead of just the desired project_a and project_shared packages like it should. Meaning that I have only two options for cleaning this up currently:

1 - I have a helper script in the docker build pipeline which forcefully modifies the pyproject.toml and only writes out the needed package names into tool.poetry.packages list.
2 - I install the whole meat and potatoes, and then in a separate step run rm -rf on the two unnecessarily installed packages inside /venv/.../site-packages/

Both options seem kind of crazy considering that functionality which allows me to be in full control of what parts of my Project I build is entirely sane and I would argue - beneficial.

I really don't see why this is an ecosystem compatibility issue to allow a project owner specify this:

my_repo/
  package_a/
  package_b/
  package_c/
  pyproject.toml
poetry build package_a package_c

and later

pip install my_a_and_c.whl

TO me it seems pretty simple...


EDIT:

To add to this, all this faff is because I want a clean resulting container. I want JUST the deps I need for package_a and no more. And I ONLY want package_a and package_shared inside the resuling container, while another container may need only package_b and package_shared. And I want it all to live under one poetry project, in one repo, and I want to sleep peacefully at night knowing I am not patronised by my package manager and allowed to do this.

@neersighted
Copy link
Member

neersighted commented Feb 8, 2023

You are inverting the dependency structure -- that's simply now how Python packaging currently works. Likewise, you're asking for Poetry to generate a composit 'a_and_b' project from a single project, based on your poetry build invocation.

What likely makes more sense here is to build a wheel of your shared_library and maintain minimal project files for each of the modules you build.

But overall, this is looking to be very much not something that is in-scope for Poetry. Poetry is not trying to solve bespoke/'non-Pythonic' packaging needs, it's aiming to be compatible with the ecosystem and introduce useful enhancements when possible.

Groups are very powerful and allow for a lot of flexibility, but we have been hesitant to extend them to export metadata for precise the reasons that come up here -- it rapidly spirals into Poetry becoming incompatible with the rest of the ecosystem, and we have to be thoughtful.

To be frank: Poetry is an important piece of infrastructure now, and is not a place to experiment with new concepts in Python packaging that break from the conventions of the rest of ecosystem. I don't think becoming the first tool to support/encourage such a novel workflow is something the project will accept. I am certainly open to feedback from the other maintainers here, but I would be surprised if any of them felt differently.

Moreover, what you want is possible if you simply have multiple pyproject.toml files and build wheels for each library, or poetry install in the relevant package_a subproject. Or you can simply poetry install --with my_groups with your 'fat' pyproject.toml and then rm -rf /path/to/venv/site-packages/{package_b,package_c}.

Poetry may not be the tool for you. We're trying to build packaging tools useful to 99% of people, and I still think you should adapt your project to align with ecosystem conventions. If you're not able to do that, you may need to find another packaging tool with different opinions/design principles/stability, but that's okay too.

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

Moreover, what you want is possible if you simply have multiple pyproject.toml files and build wheels for each library

You mean:

repo_root/
    project_a/
        pyproject.toml
        lib/
    project_b/
        pyproject.toml
        lib/
    shared/
        pyproject.toml
        lib/

And then call poetry build while inside each project_X or inside shared?

@neersighted
Copy link
Member

Or python -m build project_a for example. You can use a path dependency to express the relationship with shared, though if you commit the lock file you will need to re-lock each time that you change the metadata of shared.

@pm5k
Copy link
Author

pm5k commented Feb 8, 2023

Or python -m build project_a for example. You can use a path dependency to express the relationship with shared, though if you commit the lock file you will need to re-lock each time that you change the metadata of shared.

Except in my example just above, by making each of those a project, they cannot import from shared which defeats the entire purpose.

Try this locally:

image

And then try to import some method from shared while inside prj_c or whatever letter.

@neersighted
Copy link
Member

Right, but that's why you use a path dependency: https://python-poetry.org/docs/dependency-specification/#path-dependencies

@pm5k
Copy link
Author

pm5k commented Feb 9, 2023

But the path dependency option is explicitly used for installing the specified poetry project as a dependency into the .venv of prj_c in that example. Meaning that in order for me to use it, I have to first run poetry install, then update the package each time I make a small code change in shared. Not to mention that this creates one venv for each of the projects AND shared.

This sounds like a horrible way to work on a monorepository.

@neersighted
Copy link
Member

Ah, in that case you'd be running into #1168.

@pm5k
Copy link
Author

pm5k commented Feb 10, 2023

What does this have to do with that issue? I am confused..

@neersighted
Copy link
Member

The ability to use a path dependency in development, and but then turn around and use an index (named) dependency in production.

@pm5k
Copy link
Author

pm5k commented Feb 10, 2023

That's not at all what I want or what I meant with my message..

@neersighted
Copy link
Member

Hmm. Looks like I might have misundestood, yeah. You can install a path dependency as editable, so you'd merely have to re-lock when you change metadata (e.g. dependencies, version, etc).

Ultimately though, I think what you're trying to do is incompatible with the architecture of Poetry. It seems like you have a very specific workflow in mind, that Poetry is not designed to support, and that is rather idiosyncratic in the wider Python world. It certainly is possible to implement, but it might have to be bespoke as I do not know of any Python packaging tool that does what you want, today.

@pm5k
Copy link
Author

pm5k commented Feb 10, 2023

Look, I can respect someone saying "no I won't do it", I can also respect saying "no we won't include this, because you're the only one asking for this". I can also totally respect someone telling me "no your idea is shit, go away". I could understand that.

What I find a little annoying, lets say, is the constant lean on "Poetry follows this rigid path that we cannot deviate from lest Guido himself strike us down". I do not believe that the fact that you do not know of a packman for Python that allows selective packaging into the resulting wheel should be a show-stopper. I also believe software is malleable and ever-evolving, or should be. Leaning too heavily into the "way things were/are" seems counterproductive.

I think what's compounding my negative feeling relating to our exchange is both that sort of single-minded rigidity and the fact that my request/ask exclusively touches on two optional additional flags (heck it could even be one), which could be used only by people who wish to replicate the behaviour (via Poetry) which I am currently having to induce manually.

This harms no-one. Not the project, not the direction, not the wider community of users or contributors. It's one/two extra CLI flags which by default do not DO anything, UNLESS you explicitly smack them on the end of build and tell it which packages to ignore from the final build wheel. So when you come back to me and say that the reason for this is some deviation from PEP517 or how the Python ecosystem handles packaging, I have to ask - so what? You aren't suddenly changing default behaviour. You're adding optional behaviour allowing for extra flexibility where people who do not want to / cannot follow conventional packaging guidelines or do not care about them (and have every right to choose) could now do this via their favourite tool.

To that end, we can end the conversation here. However before that's done - can I ask, is there or will there be (in the near future) a way for me to write my own plugin into poetry which allows me to do exactly this, which I could then weave into my own workflow, or are plugins not allowed to hook into the build workflow and modify it?

@neersighted
Copy link
Member

neersighted commented Feb 10, 2023

There is the plugin mechanism, which has been mentioned.

To explain why you are making no inroads with a bespoke/counter-to-the-design-goals-of-Poetry ask: We have to maintain it. We, as the Poetry project maintainers, have to define semantics for these new flags, maintain them, and even propagate them into the ecosystem. Poetry is a bit of a bully pulpit through its popularity; if we implement & popularize something, the entire ecosystem has to live with the consequences.

Seemingly minor decisions made in/opinions encouraged by Poetry have had far ranging consequences that ultimately the entire ecosystem has had to deal with. "It would be so easy to add this" is hardly a consideration for inclusion in Poetry; instead we have to think design & deal with the entire lifecycle of a feature.

Being a maintainer is hard. We have to balance the needs of all users, and the long term implications of the decisions we make. As a project becomes more mature and established, such as Poetry is, we have to become more conservative & thoughtful in the changes we make (in design, in code, in docs, etc). To be frank, what could be a boon to you could be the scourge of many users.

Nothing is free, nothing is 'harmless.' I understand that may seem unreasonable, but someone much more eloquent than me has explained it very well.

I've done my best not to slam the door in your face, and to explain why (from the project's perspective) -- hopefully the meta-why above is helpful. If you'd like a direct put-up-or-shut-up answer from me: No. Not now, though it's not ruled out for the future. But given where Poetry is, and the fact that these kinds of topics are an ongoing discussion among people working in this space, I don't think something that is entirely counter to the current design goals of today's Poetry will be accepted.

The Poetry project could change over time, or the Python packaging ecosystem could change over time. It might even happen next week. But nothing presented here compels me to believe supporting what you ask will benefit the majority of Poetry users (who include people just pip installing a project that happens to use Poetry).

I am of course not the final authority here. The maintainers of Poetry operate through consensus, and I am happy to be overruled by my peers, especially if there is a compelling technical argument made. However, the above is the best answer that I, personally, can give you.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2023
@neersighted neersighted removed the status/triage This issue needs to be triaged label Feb 10, 2023
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Feature requests/implementations
Projects
None yet
Development

No branches or pull requests

3 participants