-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running pex_binary targets directly on the in-repo sources. #15689
Conversation
Fixes #12129 for |
2839dbf
to
4e92d60
Compare
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
What's the chances this makes it in a 2.12 release? I want to upgrade us and this would be HUGE for migrations |
@@ -548,6 +548,25 @@ class PexIncludeToolsField(BoolField): | |||
) | |||
|
|||
|
|||
class RunInlineField(BoolField): | |||
alias = "run_inline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatives:
- Call it
run_in_repo
(or similar) to capture intent. (Instead of introducing a new word: "inline") - Flip it around. I expect most users assume
run
runs the code in-repo, so make this field align with expectations.run_in_sandbox
defaulting toTrue
. Then the flag becomes--no-run-in-sandbox
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like run_in_sandbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is no flag, this is a field on a target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. Good point. Might be worth pontificating a flag akin to skip
being a flag and a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this concept of running "inline" is Python-specific, not generic to the run goal.
This is specific to any run-as-is (non-compiled, uncompiled?) language. So languages like Ruby, JS, or Shell would all be valid.
[ci skip-rust] [ci skip-build-wheels]
Could contemplate backporting, although I prefer not to for features. But this one seems relatively harmless since you have to turn it on in the target. |
I suppose an alternative we maybe haven't considered (and likely won't in the very short term) is that we're somewhat overusing A user might rightfully expect In that regard the target here might be |
Scratch that. I opened the PR and realized that:
In order to not bully features here (or have to wait for migrations), and somewhat tangentially related to the above I think I'm going to experiment with a |
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
4e92d60
to
c4c9730
Compare
I'm not sure about that. If we consider targets to be descriptive then in both cases this is "python code with an entry point". Think about your arguments for the prosecution in the case of Files vs. Resources, that targets should describe a thing, not how the thing is used. |
Very Fair. Point taken. In that regard perhaps To drive that home, too, fields like |
Yeah, I think it's |
Actually... We're running first-class sources, right? In this case .... What would it look like to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
is an example of such a process. It may also have lower latency, since no files need | ||
to be copied into a chroot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence isn't true anymore, is it? We always copy files.
@@ -568,6 +568,25 @@ class PexIncludeToolsField(BoolField): | |||
) | |||
|
|||
|
|||
class RunInSandboxField(BoolField): | |||
alias = "run_in_sandbox" | |||
default = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this default to False
? It increases the likelihood Pants Just Works, vs making you discover a niche field.
This works like a charm with, maybe a one-line(-ish) change. I don't like the implication 😈 |
# complexity of figuring out here which sources were codegenned, we copy everything. | ||
# The inline source roots precede the chrooted ones in PEX_EXTRA_SYS_PATH, so the inline | ||
# sources will take precedence and their copies in the chroot will be ignored. | ||
local_dists.remaining_sources.source_files.snapshot.digest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this concern earlier, but couldn't substantiate it. Now I can.
If we're generating sources as submodules of a parent module, Python will use the parent's path for submodule lookup. So the in-repo path is the only path considered and it fails to import the codegened module 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "fix" is to opt-out of the in-repo running, so everything is in the sandbox. Not great but not fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://docs.python.org/3/reference/import.html
The meta path may be traversed multiple times for a single import request. For example, assuming none of the modules involved has already been cached, importing foo.bar.baz will first perform a top level import, calling mpf.find_spec("foo", None, None) on each meta path finder (mpf). After foo has been imported, foo.bar will be imported by traversing the meta path a second time, calling mpf.find_spec("foo.bar", foo.__path__, None). Once foo.bar has been imported, the final traversal will call mpf.find_spec("foo.bar.baz", foo.bar.__path__, None).
So in my case the submodule is at foo.bar.baz
, but the mpf.find_spec("foo", None, None)
gives a valid spec for an in-repo module, so it assumes the in-repo tree is where we'll find foo.bar.baz
. I can maybe try to play around with namespace packages, but that seems hacky and brittle.
Unfortunately I don't have a solution right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can maybe try to play around with namespace packages
Blasting every __init__.py
up the chain with https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages worked.
Tricky Tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only sane way to solve this is to have the relevant packages be namespace packages. This can be a caveat we document, but since codegen use is not super-common, and needing to run inline is not super-common, I think that's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only sane way to solve this is to have the relevant packages be namespace packages. This can be a caveat we document, but since codegen use is not super-common, and needing to run inline is not super-common, I think that's OK.
Not quite... the reason this works in the sandbox is because we merge colliding PYTHONPATH entries into a single entry. It would be possible to do something similar in this mode, by using exclusively a sandbox entry if not doing so would result in packages being split across PYTHONPATH entries.
Whether that qualifies as "sane" is an open question, but.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not qualify as sane IMO. The whole point of this change is to allow you to force something to run in the repo if it needs to. Things do not generally need to. I don't think we need to switch the default here, for example. But if you do need to we shouldn't silently not do what you asked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the proposal is to change the default behavior then that would have to be strongly motivated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very of the mind that this PR only fills the gap of not being able to ./pants run random_file.py
in-repo. To me, running the `pex_binary`` actually runs the PEX binary (and there is no "sandbox" vs in-repo toggle).
That being said, this particular issue will travel to the ./pants run random_file.py
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some processes, such as Django's
makemigrations
use the__file__
and/or__path__
of modules to generate new paths to write to, relative to those. If run against sources in the tmpdir, those files will be written into the tmpdir and then cleaned up.Running inline is necessary in some cases, and whether it is necessary or not is a property of the binary (does it need to know the true locations of the source files it runs on), and so the inline-ness is a property of a
pex_binary
target rather than a command-line option.The consequences of running inline are that more code will be accessible on the sys.path - basically all the code in the relevant source roots. Theoretically this could lead to different behavior than running sandboxed. But this is unlikely to be a problem in practice, especially for the types of binary we're talking about. They already implicitly expect to run with that context, so if there is a difference in behavior then arguably the sandboxed behavior is the wrong one.
Note that this concept of running "inline" is Python-specific, not generic to the
run
goal. There is no meaning to running inline for a compiled language.[ci skip-rust]
[ci skip-build-wheels]