-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build and install python_distribution deps for test, run, and repl #12573
Conversation
Note that this reveals a slight awkwardness with We may want to rethink |
[ci skip-rust] [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the sys.path
duplication is a fatal problem for things as they stand unless dependening on a python_distribution
does not actually cause a dependency on the python_distribution
's dependencies in turn.
@@ -46,6 +68,7 @@ async def create_python_repl_request(repl: PythonRepl, pex_env: PexEnvironment) | |||
extra_env = { | |||
**complete_pex_env.environment_dict(python_configured=requirements_pex.python is not None), | |||
"PEX_EXTRA_SYS_PATH": ":".join(chrooted_source_roots), | |||
"PEX_PATH": repl.in_chroot(local_dists.pex.name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think you now depend on the local_dists
twice. Once up in PEX_EXTRA_SYS_PATH
above as loose sources (with no .dist-info/
metadata dir) and then here as an installed wheel with a .dist-info/
metadata dir. I'm not sure which is 1st on the final sys.path
and the order should not matter anyhow since there will be just one proj-rev.dist-info/
dir on the sys.path
, but ... this duplication of everything else feels like we're walking into a trap. For example, although the .dist-info/
case works out solving one user's issue, I'm pretty sure a native extension that lives in a package may not. For example, if there is pure python code packaged as package/module.py
and package/module.py
does not live in a maespace package and the loose source version of this file ends up 1st on the sys.path
, then the package/native.cpython-39-x86_64-linux-gnu.so
located in an installed wheel on the PEX_PATH
will never be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah:
$ cat <<EOF > BUILD.test
python_library(
name="jake",
dependencies=[
"src/python/pants:pants-packaged",
],
)
EOF
^jsirois@gill ~/dev/pantsbuild/pants (main) $ ./pants filedeps --transitive //:jake
3rdparty/python/BUILD
3rdparty/python/requirements.txt
BUILD.test
src/python/pants/BUILD
src/python/pants/VERSION
src/python/pants/__main__.py
src/python/pants/backend/awslambda/python/BUILD
src/python/pants/backend/awslambda/python/__init__.py
src/python/pants/backend/awslambda/python/lambdex.py
src/python/pants/backend/awslambda/python/register.py
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct, and in fact occurred during my testing. The loose sources win, and make the native code unimportable if not in a namespace package.
The duplication itself can be mitigated using the !!
operator to exclude deps. And we can mitigate it in the future by A) not attempting to infer deps that can be satisfied by a local dist and B) not following explicit deps through a python_distribution
target when computing the transitive closure.
But the namespace package problem may occur regardless, even if we eliminate duplication, because of code outside the dist sharing a package with code in the dist. So depending on a dist like this does require you to know what you're doing.
That said, as this stands, it solves a real user problem, albeit with a gotcha, which we can document. So, on balance, we might want to release this and potentially deal with the missing pieces as above in a followup. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no well justified thoughts. It seems to me subtracting the closure of a python_distribution dep sub graph from the larger graph in play is not actually hard, in which case it would be great to just take a day or a week to do things right and ship in the next dev release. I may be underestimating the difficulty of doing this right from the get go though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure how hard it would be. But it doesn't fully solve the namespace problem. In most/many cases you will still have to ensure the code is in a namespace package, at which point the duplication problem seems less critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inner" or "code" targets that own files (python_library, python_tests, files, resources), and then "artifact" targets (python_distribution, pex_binary) that don't own files but instead represent an artifact to be built. We don't model this distinction explicitly, but we probably should. Maybe we should even use different terms for them.
This is precisely why I have been so excited about that cleanup a few months ago for python_awslambda
and pex_binary
to no longer have sources
. This is the mental model I've been using since that breakthrough, including when envisioning the elusive "re-envisioning targets" proposal. I do think there's a big difference between those two types of targets, and it is valuable to codify it more.
Still handwavy, but imo: targets work really well for artifact-type targets like pex_binary
. They also seem to work well for 3rd-party deps metadata. They do not work well for first-party code, with issues like it being too hard to update just 1 file's metadata. So, I could envision us codifying this distinction by replacing 1st-party targets w/ something else entirely, for example.
--
about now having to keep track of all the different unwritten semantics.
@benjyw it sounds like you are envisioning a particular target always behaving the same way when in dependencies
, and not depending on the call site's custom semantics? (Outside of project introspection). So, if you depend on a python_distribution
, that always means to install it?
That consistency would be good imo and make me more comfortable with this.
If so, where would that be documented? On the python_distribution
target itself? In the dependencies
field help
message for each consumer? (I'm asking because I think it's important we know how to explain this to users.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having special-cased fields, you could use the same
python_distribution
to both a) install as a local dist you can import, b) inspect the.whl
file, and c) use in some way we can't anticipate that plugin authors write. Or, pick and choose from those 3.
To clarify something: both of my suggestions involved using the SpecialCasedDependencies
field, but they proposed using them in different places. Would suggest taking another look there.
The current implementation allows for either "consumer decides" or "producer decides", depending on where SpecialCasedDependencies
are used.
@benjyw it sounds like you are envisioning a particular target always behaving the same way when in dependencies, and not depending on the call site's custom semantics? (Outside of project introspection). So, if you depend on a python_distribution, that always means to install it?
This is also what I felt would likely be least redundant:
I think that given that we have the python_distribution target which already "marks" a particular intended behavior, option 2 would be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I agree that we always want the same behavior when you depend on a python_distribution
, and I think that behavior fairly obviously should be "build the dist and install it in the downstream environment". Having a dependency mean "depend on the python_library targets that a python_distribution depends on", seems weird. And if someone really wants to depend on those python_library targets, they can just depend on them directly.
In basically 100% of cases today, depending on a python_distribution
is a smell, and we have masked the smell by giving that dependency a not-particularly-useful meaning. And I doubt there are many such dependencies today, since it doesn't really make sense, especially with dep inference. What I'm proposing is to have that dependency mean something more intuitive.
That said, again, dep inference really muddies the water...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think you now depend on the
local_dists
twice. Once up inPEX_EXTRA_SYS_PATH
above as loose sources (with no.dist-info/
metadata dir) and then here as an installed wheel with a.dist-info/
metadata dir.
This still seems like a blocking issue. It seems like you need to introduce SpecialCasedDependencies
for the distribution target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that breaks other things.
So, how to make progress here? What pre- or post-work would make folks comfortable with merging this? |
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
Hi reviewers! I'd like to move towards merging this. There appear to be two contentious issues:
This solves a real user's real problem, and likely doesn't affect other users who don't need this feature. So - can I get a review with an eye to merging? Or do you object to this plan? Thanks. |
I'm now okay with this proposal because you are planning for depending on a I will feel even more comfortable with this if we document the behavior explicitly. Probably in the help for -- Should --
FYI I think the fix is to change the dependencies field for It does become a little awkward to consume the dependencies for the await Get(UnparsedPythonAddresses, python_dist[PythonDistDependencies].to_unparsed_address_inputs()) I do think this should be prework, and it makes sense to have a dedicated entry in the changelog: the API change is that depending on a I know this all probably sounds Greek - feel free to delegate to me as prework for landing this. |
[ci skip-rust] [ci skip-build-wheels]
Interesting. This kind of problem keeps popping up - @kaos's Docker work has a similar issue - depending on a So I wonder if we want to bite the bullet and finally make a formal distinction between "edge targets" and "library targets". I will chew on this. |
Getting to know more about special cased deps just now, I'm thinking if it wouldn't be useful to be able to get deps transitively with a exclude/include lists of dep fields, a la source type fields arg to a sources req. That way you have finer level of control to slice and dice the deps you get. |
So, "make binary targets' Probably we want to re-think All that said, since this pre-work has become a yak shave, I would like to go ahead and merge this, since it solves a real user problem (with a gotcha that we can mitigate with the above-mentioned work). Letting it bit-rot while we slowly finesse the Thoughts? |
Makes sense to me, at least ;) I like @jsirois suggestion with |
What happens with this PR if a |
Alternatively, allow people to filter the dependencies that they request by
I do think that "consumer decides" works in some cases, but there are plenty of others where that would just seem like a gotcha. The only reason to depend on a |
Yeah, agreed. @benjyw has convinced me we want depending on a target type to always mean the same thing. I agree. It will be important to document that behavior somehow, so people know "depending on a python_distribution always results in building it and inserting into chroot - unless it's project introspection." |
[ci skip-rust] [ci skip-build-wheels]
So, @Eric-Arellano attempted to implemented his proposed suggestion as pre-work, and discovered, as I did when I also attempted something similar in #12709, that it's more complex than it seems, and requires some thinking. Meanwhile, as I keep saying, this change solves a real user problem, and is slowly bitrotting. So please review with an eye to merging, with the dependency semantics issue (which is wider than just this change) to be addressed later, or argue against merging this. Thanks! |
What was the conclusion? I don't see the blocking issue described on #12709. |
[ci skip-rust] [ci skip-build-wheels]
Just remembered/realized that even the proposed pre-work of changing the python_distribution target's dependencies field to be a SpecialCasedDependencies won't solve the "sources appearing twice on the syspath" problem, because dep inference will bring in at least some of those dependencies regardless: Presumably if some code is consuming a dist, it's importing symbols from that dist, and dep inference will obliviously do its thing. The fix to that might be for dep inference to infer deps on the dist instead of on the source, but that is complicated - we now have to treat sources inside the dist differently from sources outside the dist that consume it. So I'm back to thinking that the pre-work is not necessary for the purpose of this change, as it doesn't actually solve the problem. We do need to re-think how we handle dependencies in these various cases, but I'm back to thinking that it doesn't block this change. |
Probably the right solution is something @jsirois suggested - simply subtracting the sources in the dist from the loose sources. |
Okay I think that makes sense. What about third-party dependencies? Edit: I think you wouldn't have sys.path issues, but would install deps twice? When installing the dist and also as normal 3rd-party requirements. |
[ci skip-rust] [ci skip-build-wheels]
That's a pip thing - you're trying to pip-install two dists with the same project name, so pip will pick one. But in any case you shouldn't be trying to do that in the first place. |
OK, latest commit implements that subtraction. It can be reviewed independently if you're familiar with the rest of this change, |
[ci skip-rust] [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! How about updating the PR title so this better captures the breaking API change?
Depending on
python_distribution
now builds and installs it fortest
,run
, andrepl
pantsbuild.pants-2.8.0.dev1/pantsbuild.pants.egg-info/SOURCES.txt
Outdated
Show resolved
Hide resolved
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ | ||
pex.digest, | ||
local_dists.pex.digest, | ||
local_dists.remaining_sources.source_files.snapshot.digest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the remaining sources added here? If I depend on a python_distribution
to get its wheel as a dependency, the python_distribution
target itself may contain many more files than those that actually make it into the wheel. I don't want those files, just the wheel; that's why I picked depending on the python_distribution
and not a python_sources
target it depends on - say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has changed a bunch since this PR, but IIRC the issue is something like:
python_library_A explicitly depends on python_distribution_B and also imports module C from python_library_C which happens to be in the transitive dep closure of B, but is not published in B's wheel.
Rigorously, I suppose we could assume that C is published in some other distribution (or else how could the wheel of B consume it?) But I'm not sure that rigor holds up in reality. Maybe B depends on C via a spurious explicit dep, or the specific file from C imported by A isn't published in C's wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaos I think that just addresses 3rdparty transitive deps, not 1st party. The case @benjyw sketched with A, B, C is 1st party I believe. Unless I'm missing something, we have the information on disk to deal with Benjy's case. It might be a good deal of work to implement and it might be expensive in runtime, but its also opt-in and would give correct results. Right now depending on local dists has had a history of expediency that has solved immediate needs but consistent not been comprehensive; so we've continued to suffer paper-cuts from our perspective as devs running support but likely more serious wounds to the users encountering them.
Thanks for the case Benjy, that actually sounds likely now that you spell it out. I'll file an issue tracking the problem the current inclusion of extra files from LocalDist causes and then deciding whether to continue with expedient fixes (if there are any) vs. a comprehensive fix up attack can be made there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed #18254 to track the bug including the extra files can lead to.
If you (transitively) depend on a
python_distribution
, Pants will now build andinstall it into the venv for
test
,run
andrepl
.We don't do anything special when packaging, under the assumption that the
python_distribution
will itself be packaged and depended on as a published artifact.This change allows you to seamlessly use existing/custom setup.py - notably to build Python
native extensions - in Pants. If a
setup.py
can build it, Pants can consume it.Note that adding this support was very little code, thanks to the engine. Most of this
change is tests.
This change also tweaks the testprojects native extensions example, to clarify what
is going on.
Addresses #11942
[ci skip-rust]
[ci skip-build-wheels]