New Python Backend wishlist #20897
Replies: 28 comments 66 replies
-
I'll start: a common request we get on Slack is for different config to apply in different parts of the codebase. So one "big idea" for a new backend is to allow (some) options to be set in BUILD file targets, to apply to the files in that target. |
Beta Was this translation helpful? Give feedback.
-
Another thing I'd like to implement: Getting rid of dummy tailored BUILD files, and just doing the right thing in the 95% of cases where that is obvious. In other words, moving from a target-centric user view to a file/directory-centric one. |
Beta Was this translation helpful? Give feedback.
-
De-coupling e.g. So, maybe having Ditto for |
Beta Was this translation helpful? Give feedback.
-
The number of 3rd party packages increases with larger repositories. As this number grows, lock generation takes time and affects the developer experience. Just as |
Beta Was this translation helpful? Give feedback.
-
Unsure if this is I haven't given this any thought past the above statement, I'm looking into a cython plugin/backend again soon, and it's at the top of my mind right now :) |
Beta Was this translation helpful? Give feedback.
-
I am running the GPU version of PyTorch with |
Beta Was this translation helpful? Give feedback.
-
It would be great if updating the lockfile didn't require us to rebuild all of our PEXs. Also being able to update specific 3rd party dependencies in the lockfile while not updating everything. |
Beta Was this translation helpful? Give feedback.
-
It would be cool if 400 resolves didn't eat up all the memory 😄 #20568 |
Beta Was this translation helpful? Give feedback.
-
Some observations (across several comments for better threading) I think Pants works especially well when it's as thin a wrapper around the underlying tools as reasonable, and, in particular, ensures that as much of the functionality of those tools is available as possible. o make this more concrete, this might mean a target like I think this includes both "more naive" target fields but also generic "pass these args too" like added in #20737 (the discussion there has some nuance too). For the pass-through args, maybe this would even include things like |
Beta Was this translation helpful? Give feedback.
-
Some observations (across several comments for better threading) The I think the python_sources(name="src")
pex_binary(name="a", entry_point="a.py")
pex_binary(name="b", entry_point="b.py", dependencies=[":a"])
# (silently) doesn't include the `a.pex` and not sure there's any way to get it there This isn't a problem with the Python backend alone of course. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
One thing that would be really nice is default partitioning Python tools by config: #17739 |
Beta Was this translation helpful? Give feedback.
-
From the monthly meeting: Is there a way for the new python backend to define an API (or pseudo API, or API + implementation) that we could backfill into the other backends to share as much code as possible, and reduce the per-backend maintenance burden (especially for those backends where we don't have dedicated users at the moment). For example: Python, JVM, and Go backends don't have a lot of shared code and each solve similar problems in different ways (with good historical reasoning for that). From my personal experience: The C/C++ backend is a surprisingly small amount of novel code for what it is and leverages a lot of the APIs that we currently use. Ditto for Swift, where it packages a whole "module's" worth of code together at once, but there really isn't much "swift-centric" Pants code required to get 90% of Pants functionality out of it. |
Beta Was this translation helpful? Give feedback.
-
Would it make sense to provide for a way to run tests sequentially without We are migrating to pants currently and our codebase does not support running tests in parallel. However I see no reason why we wouldn't be able to build sandboxes in parallel. |
Beta Was this translation helpful? Give feedback.
-
It would be nice, if python backend had a better integration with docker. Right now you have to do a lot of steps to create an optimized docker image with 2 layers of pexes - thirdparty dependencies and first party dependencies. It's even harder to split thirdparty dependencies into more layers. |
Beta Was this translation helpful? Give feedback.
-
Would be nice to be able to add explicit dependencies in python code. I mean something like:
Right this can be done through # pants: infer[//my/package]
package_path: my.package |
Beta Was this translation helpful? Give feedback.
-
We'll want to properly model transitive 3rdparty deps, so that, e.g., |
Beta Was this translation helpful? Give feedback.
-
We should look into caching certain "facts" such as "this test passes" or "this file passes lint" independently of caching the process that asserted that fact. This allows us to disconnect caching/invalidation granularity from process granularity: We can run processes in batch for efficiency but still cache at a fine-grained level. |
Beta Was this translation helpful? Give feedback.
-
First disclaimer: Pants is awesome, and the community is awesome. The work and thoughtfulness of the team is to be commended. I have some criticisms, but I am only being exposed to Python and its tooling in the last year or so, and my needs are simpler than many of Pants' users. Also, I'm using this as a place to document my experience as a new user of Pants and a semi-new user of Python in general. So some of this feedback is Python-specific, and some of it is Pants-specific. Before I have the "curse of knowledge" and know too much about Pants, I wanted to write down my experience in hopes that it helps with the new direction Pants/Python might go! TL;DR: I'd love for the next Python backend to be faster and simpler for the common case :) ============== This post comes at a great time for us. I am SO excited about the potential of Pants for Python. I am a believer in monorepos. I am in the middle of rearranging my "organically" grown repo at work that started as a single ML/AI project, and has now morphed into something I am not proud of. :) "All" I'm trying to do is this: Follow a good convention for organizing my Python-only repo for multiple, potentially related python projects, with a standard way of building, doing CI in Github Actions, and deploying on Docker. I want my daily workflow to be fast, and I want it to be fast for my team. I love opinionated code formatters, and opinionated convention-based build systems. :) In an effort to learn how Pants worked, I cobbled together a monorepo example based on other examples I found: Pants has some fantastic ideas and implementation, and I love how involved the community is! (People were responding to my Slack post within no time, and there are some great testimonials from teams much larger than mine.) Here are the other things attracting me to Pants for a Python monorepo.
Here are the things that have concerned me as I learned about it. To be clear, these are not necessarily failings of Pants or the team. Just documenting my experience as a newbie.
1.5) It took me a long time to just learn how to update the version of a build tool I was using. The tools are built in, which is great, but what if I want to use a new version of ruff? If I hadn't already followed the lockfile-per-tool convention, I don't think I could? I'm still not sure how I'd add another tool that wasn't built-in.
========================== What I'm considering instead: B) Rye: https://rye-up.com/guide/ I don't think Rye is long for this world now that it's been adopted by Astral (of ruff/uv fame), but its successor will be presumably developed/released by them, and I expect its abilities to grow. Also, it's based on standards like requirements.txt, project.toml, penv, etc, so if it goes away, ripping it out will be quick/easy. Frankly, seeing how popular uv and ruff became in a short period of time, I expect their next build-based tool to be similarly popular. It will do ONLY python, and will do 25% of what Pants does, but it will likely be the 25% I need for my Python monorepo. So if Pants had a similar, "Just use this backend, and we'll bootstrap Python for you, handle linting, dependency installation/subset detection, deployment to common targets, etc" and we're about as fast as using Thanks for coming to my talk. ;) |
Beta Was this translation helpful? Give feedback.
-
Couple of ideas
|
Beta Was this translation helpful? Give feedback.
-
My current experience of migrating our monorepo to pants have one major pain point: multi-resolves. So we have a few AI projects or projects that depends on application dependency with strict constraints, so we have to keep them in a different resolve (so we have something like I thought I would then just set the resolve on the proper PEX and make sure all the required third-party are in the lockfile. Turns out these projects depend on other internal module a bit everywhere. Obviously, starting from scratch, things would have been better organized, but we are migrating to pants for a reason after all. My main issue so far is that there's no automatic resolve inference. So if a module depends on another module that is not in the same resolve, I get an error. Throw in transitive dependencies and you have me trying to carve out the resolve over our thousands of modules for the last few weeks. Add too much and I get an error about missing third-party dependencies. Add too little and I get an error about a file not in the resolve. Now I'm full of I feel that pants should be able to infer these and error when some module depends on a third-party not available in the resolve. After all, isn't the whole point of resolve, 3rd-party dependencies? Why bother us with 1st-party. |
Beta Was this translation helpful? Give feedback.
-
Just ran into another one which is specific to Python/Pex. In some form, a lot of the other backends rely on Pex for tooling - which makes Pants itself harder to hack on, to try out experiments, or test out random ideas. I ran into this with the The number of cyclic dependencies I've run into just trying to throw some Getters into random places in the code to see what happens if we change order of calls or something, and then getting hit with a cyclic dependency that 8-10 levels down comes back to Example: Last night + this morning, I'm at 2 hours of fighting cyclic dependencies trying to test out a new docker feature by using a So, I guess the weirdness comes about by requiring the Python backend to be a dependency of other backends (by virtue of needing Pex/PythonToolBase) which is fine - but then the dependency tree explodes as a result. This might fall into the camp of a well defined API, or using visibility rules, or something from the get-go - I'm not really sure. |
Beta Was this translation helpful? Give feedback.
-
I've been trying to address #15481, but it's proving difficult. I need to make some I looked at turning |
Beta Was this translation helpful? Give feedback.
-
Contra-viewpoint (for discussion purposes): Should we just evolve the existing Python backend instead of rewriting it but do so in a way which minimizes impact on existing users of the Python backend? We need not do the evolution in its current place in the Pants repository. Rather, what if we adopt the idea of "channels" (e.g., stable, nightly) for the Python backend like how Rust is distributed? That is, fork the Python backend into its own repository but keep the existing "stable" version in the main repository. Then start evolving the Python backend in its own repository as an "edge" or "nightly" channel. When the dev channel of the backend becomes stable enough to cut a release, we would then merge the changes into the main Pants repository. For the sake of making the point, I am glossing over things like not losing git history in moving from dev/edge to stable, what constitutes a good point to stabilize a set of changes in the new channel, etc. Basically, instead of rewriting the software, let's focus more on how we develop the software including where development takes place and how we package and distribute the software. |
Beta Was this translation helpful? Give feedback.
-
For a relatively large monorepo, where we were disciplined enough to create libraries and declare our own dependencies on them, we find the fact that every file is a target to be a hinderance. We have an huge number of targets, which causes both slowness and memory usage issues. Even just passing a list of targets to pants and waiting for it to start running the tests takes up to 3-4 minutes at times. I'm not sure if anything can be done without changing how pants works fundamentally, but it'd be nice to have an option |
Beta Was this translation helpful? Give feedback.
-
The main complaint I have gotten about pants is that it is slow, in two particular areas: generate-lockfiles and building requirements pexes (which still happens very often). It also doesn't play very nicely with ML code due to the very heavy deps that ML packages and the performance slowdown that comes from that in pants' view of the world. I would also be interested in improvements to remote cache hit rates for dev machines. But in general, as a tool that is in the hot loop of development, better user experienced performance is highly desirable. |
Beta Was this translation helpful? Give feedback.
-
Would love to see support for |
Beta Was this translation helpful? Give feedback.
-
It would nice to have the ability to test functions in the same file as the source. In some cases it can be beneficial to keep the test close to the source and you might see something like this: # File: my_function.py
# PS: Filename does not contain the word "test"
def my_function(**kwargs): ...
def test_my_function(**kwargs): ... Pytest allows for this by having this in your pytest.ini: [pytest]
python_files = *.py
python_functions = test_* _test_* Current issues I have seen with this in pants:
|
Beta Was this translation helpful? Give feedback.
-
The Python backend is the oldest part of Pants v2. It was initially designed back in 2019-2020, based on the experiences of its implementers, with some external feedback but also a lot of guesswork. So, while very useful, it has become quite bloated, and doesn't have great support for some real-world use cases.
Now that we have the benefit of several years of intensive usage, we're looking at creating a new, more streamlined Python backend. We want this proposed new backend to support all the common (and 90% of the uncommon) use cases of the existing backend, but to also support new use cases that the current backend doesn't handle well. All this while, ideally, being easier to set up and maintain.
It's important to note that the new backend would be opt-in, and we would not get rid of the old one until the new one achieves feature parity and then a long transition period.
So this discussion is a place for you to throw down your wish lists for Pants' python support: What are your use cases? What is not currently well-supported and you'd like to see improved? What ideas do you have? Feel free to post your thoughts, and also to comment or "thumbs up" other posts, to express your support for that idea.
(In your posts, please be kind and respectful of the many thousands of hours of work by dozens of people that went into the existing backend...)
Beta Was this translation helpful? Give feedback.
All reactions