Support running pex_binary targets directly on the in-repo sources. #15689

benjyw · 2022-05-27T18:26:27Z

Some processes, such as Django's makemigrations use the __file__ and/or __path__ of modules to generate new paths to write to, relative to those. If run against sources in the tmpdir, those files will be written into the tmpdir and then cleaned up.

Running inline is necessary in some cases, and whether it is necessary or not is a property of the binary (does it need to know the true locations of the source files it runs on), and so the inline-ness is a property of a pex_binary target rather than a command-line option.

The consequences of running inline are that more code will be accessible on the sys.path - basically all the code in the relevant source roots. Theoretically this could lead to different behavior than running sandboxed. But this is unlikely to be a problem in practice, especially for the types of binary we're talking about. They already implicitly expect to run with that context, so if there is a difference in behavior then arguably the sandboxed behavior is the wrong one.

Note that this concept of running "inline" is Python-specific, not generic to the run goal. There is no meaning to running inline for a compiled language.

[ci skip-rust]

[ci skip-build-wheels]

benjyw · 2022-05-27T18:26:32Z

Fixes #12129 for run. A separate change can fix for repl.

src/python/pants/backend/python/target_types.py

src/python/pants/util/strutil.py

[ci skip-rust] [ci skip-build-wheels]

thejcannon · 2022-05-31T12:18:16Z

What's the chances this makes it in a 2.12 release? I want to upgrade us and this would be HUGE for migrations

thejcannon · 2022-05-31T13:32:55Z

src/python/pants/backend/python/target_types.py

@@ -548,6 +548,25 @@ class PexIncludeToolsField(BoolField):
    )


+class RunInlineField(BoolField):
+    alias = "run_inline"


Alternatives:

Call it run_in_repo (or similar) to capture intent. (Instead of introducing a new word: "inline")

Flip it around. I expect most users assume run runs the code in-repo, so make this field align with expectations. run_in_sandbox defaulting to True. Then the flag becomes --no-run-in-sandbox.

Yeah, I like run_in_sandbox

Note that there is no flag, this is a field on a target

Ah yes. Good point. Might be worth pontificating a flag akin to skip being a flag and a field.

Note that this concept of running "inline" is Python-specific, not generic to the run goal.

This is specific to any run-as-is (non-compiled, uncompiled?) language. So languages like Ruby, JS, or Shell would all be valid.

[ci skip-rust] [ci skip-build-wheels]

benjyw · 2022-05-31T20:22:43Z

What's the chances this makes it in a 2.12 release? I want to upgrade us and this would be HUGE for migrations

Could contemplate backporting, although I prefer not to for features. But this one seems relatively harmless since you have to turn it on in the target.

thejcannon · 2022-05-31T20:26:25Z

I suppose an alternative we maybe haven't considered (and likely won't in the very short term) is that we're somewhat overusing pex_binary here for run and for package, when in reality they are two very different use-cases and exhibit very different behavior.

A user might rightfully expect run of a packageable to be essentially a package followed by executing the package.

In that regard the target here might be python_script, which is only runnable and not packageable.

thejcannon · 2022-05-31T20:30:23Z

Could contemplate backporting, although I prefer not to for features. But this one seems relatively harmless since you have to turn it on in the target.

Scratch that. I opened the PR and realized that:

The relevant change to make this work is in the run goal support for pex_binary
The goal code is minimal

In order to not bully features here (or have to wait for migrations), and somewhat tangentially related to the above I think I'm going to experiment with a python_script target as an in-repo plugin in our codebase.

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

benjyw · 2022-05-31T20:58:00Z

I suppose an alternative we maybe haven't considered (and likely won't in the very short term) is that we're somewhat overusing pex_binary here for run and for package, when in reality they are two very different use-cases and exhibit very different behavior.

A user might rightfully expect run of a packageable to be essentially a package followed by executing the package.

In that regard the target here might be python_script, which is only runnable and not packageable.

I'm not sure about that. If we consider targets to be descriptive then in both cases this is "python code with an entry point". Think about your arguments for the prosecution in the case of Files vs. Resources, that targets should describe a thing, not how the thing is used.

thejcannon · 2022-05-31T21:07:41Z

I suppose an alternative we maybe haven't considered (and likely won't in the very short term) is that we're somewhat overusing pex_binary here for run and for package, when in reality they are two very different use-cases and exhibit very different behavior.
A user might rightfully expect run of a packageable to be essentially a package followed by executing the package.
In that regard the target here might be python_script, which is only runnable and not packageable.

I'm not sure about that. If we consider targets to be descriptive then in both cases this is "python code with an entry point". Think about your arguments for the prosecution in the case of Files vs. Resources, that targets should describe a thing, not how the thing is used.

Very Fair. Point taken. In that regard perhaps pex_binary is itself a misnomer and an alias for python_script. In this instance pex is really only an implementation detail of the requirements (not 100% true, but mostly), and is less-but-still-technically an implementation detail for package.

To drive that home, too, fields like include_requirements, include_tools, and layout are meaningless to this target if only ever run. Really you just need dependencies and entry_point. Then (and I'm going somewhat off the rails here, but keep your imagination open) maybe a package_opts field set to an instance of a PexOptions (a la HTTPSource). It allows for not bloating the target type on details of PEX it will never use, and also allows for other package options to share the python_script target type (PyInstallerOptions or PyOxyOptions).

benjyw · 2022-05-31T21:13:42Z

Yeah, I think it's pex_binary because package turns it into a pex, and if there were some other format, it would have a separate target, which would itself also be runnable

thejcannon · 2022-05-31T21:28:24Z

Actually... We're running first-class sources, right? In this case $ ./pants run path/to/manage.py makemigrations

.... What would it look like to have ./pants run path/to/manage.py run via a FieldSet containing PythonSourceField? I'm gonna try it 😂

Eric-Arellano

Looks good

Eric-Arellano · 2022-05-31T23:56:18Z

src/python/pants/backend/python/target_types.py

+        is an example of such a process.  It may also have lower latency, since no files need
+        to be copied into a chroot.


This last sentence isn't true anymore, is it? We always copy files.

Eric-Arellano · 2022-05-31T23:56:50Z

src/python/pants/backend/python/target_types.py

@@ -568,6 +568,25 @@ class PexIncludeToolsField(BoolField):
    )


+class RunInSandboxField(BoolField):
+    alias = "run_in_sandbox"
+    default = True


Should this default to False? It increases the likelihood Pants Just Works, vs making you discover a niche field.

thejcannon · 2022-06-01T01:58:10Z

Actually... We're running first-class sources, right? In this case $ ./pants run path/to/manage.py makemigrations

.... What would it look like to have ./pants run path/to/manage.py run via a FieldSet containing PythonSourceField? I'm gonna try it joy

This works like a charm with, maybe a one-line(-ish) change. I don't like the implication 😈

thejcannon · 2022-06-02T17:51:03Z

src/python/pants/backend/python/goals/run_pex_binary.py

+        # complexity of figuring out here which sources were codegenned, we copy everything.
+        # The inline source roots precede the chrooted ones in PEX_EXTRA_SYS_PATH, so the inline
+        # sources will take precedence and their copies in the chroot will be ignored.
+        local_dists.remaining_sources.source_files.snapshot.digest,


I had this concern earlier, but couldn't substantiate it. Now I can.

If we're generating sources as submodules of a parent module, Python will use the parent's path for submodule lookup. So the in-repo path is the only path considered and it fails to import the codegened module 😢

The "fix" is to opt-out of the in-repo running, so everything is in the sandbox. Not great but not fatal.

From https://docs.python.org/3/reference/import.html

The meta path may be traversed multiple times for a single import request. For example, assuming none of the modules involved has already been cached, importing foo.bar.baz will first perform a top level import, calling mpf.find_spec("foo", None, None) on each meta path finder (mpf). After foo has been imported, foo.bar will be imported by traversing the meta path a second time, calling mpf.find_spec("foo.bar", foo.__path__, None). Once foo.bar has been imported, the final traversal will call mpf.find_spec("foo.bar.baz", foo.bar.__path__, None).

So in my case the submodule is at foo.bar.baz, but the mpf.find_spec("foo", None, None) gives a valid spec for an in-repo module, so it assumes the in-repo tree is where we'll find foo.bar.baz. I can maybe try to play around with namespace packages, but that seems hacky and brittle.

Unfortunately I don't have a solution right now.

I can maybe try to play around with namespace packages

Blasting every __init__.py up the chain with https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages worked.

Tricky Tricky.

Yes, the only sane way to solve this is to have the relevant packages be namespace packages. This can be a caveat we document, but since codegen use is not super-common, and needing to run inline is not super-common, I think that's OK.

Yes, the only sane way to solve this is to have the relevant packages be namespace packages. This can be a caveat we document, but since codegen use is not super-common, and needing to run inline is not super-common, I think that's OK.

Not quite... the reason this works in the sandbox is because we merge colliding PYTHONPATH entries into a single entry. It would be possible to do something similar in this mode, by using exclusively a sandbox entry if not doing so would result in packages being split across PYTHONPATH entries.

Whether that qualifies as "sane" is an open question, but.

That does not qualify as sane IMO. The whole point of this change is to allow you to force something to run in the repo if it needs to. Things do not generally need to. I don't think we need to switch the default here, for example. But if you do need to we shouldn't silently not do what you asked.

If the proposal is to change the default behavior then that would have to be strongly motivated.

I'm very of the mind that this PR only fills the gap of not being able to ./pants run random_file.py in-repo. To me, running the `pex_binary`` actually runs the PEX binary (and there is no "sandbox" vs in-repo toggle).

That being said, this particular issue will travel to the ./pants run random_file.py implementation.

benjyw added the category:new feature label May 27, 2022

benjyw mentioned this pull request May 27, 2022

Use sources directly from the build root in ./pants run and ./pants repl #12129

Open

benjyw requested review from Eric-Arellano, stuhood and jsirois May 27, 2022 18:27

jsirois approved these changes May 27, 2022

View reviewed changes

stuhood approved these changes May 27, 2022

View reviewed changes

src/python/pants/backend/python/target_types.py Show resolved Hide resolved

benjyw commented May 28, 2022

View reviewed changes

src/python/pants/util/strutil.py Show resolved Hide resolved

benjyw force-pushed the run_on_original_sources branch from 2839dbf to 4e92d60 Compare May 28, 2022 06:33

benjyw requested a review from stuhood May 28, 2022 23:26

benjyw added 2 commits May 28, 2022 17:25

Support running pex_binary targets directly on the in-repo sources.

478c575

[ci skip-rust] [ci skip-build-wheels]

Support the codegen case.

85aa22d

[ci skip-rust] [ci skip-build-wheels]

thejcannon reviewed May 31, 2022

View reviewed changes

Merge branch 'main' into run_on_original_sources

9d4e852

[ci skip-rust] [ci skip-build-wheels]

Switch the sense of the field

c4c9730

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

benjyw force-pushed the run_on_original_sources branch from 4e92d60 to c4c9730 Compare May 31, 2022 20:35

benjyw merged commit 981ebcc into pantsbuild:main May 31, 2022

benjyw deleted the run_on_original_sources branch May 31, 2022 22:07

Eric-Arellano reviewed May 31, 2022

View reviewed changes

thejcannon reviewed Jun 2, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support running pex_binary targets directly on the in-repo sources. #15689

Support running pex_binary targets directly on the in-repo sources. #15689

benjyw commented May 27, 2022

benjyw commented May 27, 2022

thejcannon commented May 31, 2022

thejcannon May 31, 2022

benjyw May 31, 2022

benjyw May 31, 2022

thejcannon May 31, 2022

thejcannon May 31, 2022

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

thejcannon commented May 31, 2022

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

Eric-Arellano left a comment

Eric-Arellano May 31, 2022

Eric-Arellano May 31, 2022

thejcannon commented Jun 1, 2022

thejcannon Jun 2, 2022

thejcannon Jun 2, 2022

thejcannon Jun 2, 2022

thejcannon Jun 2, 2022

benjyw Jun 2, 2022

stuhood Jun 2, 2022 •

edited

Loading

benjyw Jun 2, 2022

benjyw Jun 2, 2022

thejcannon Jun 3, 2022

thejcannon Jun 16, 2022

		is an example of such a process. It may also have lower latency, since no files need
		to be copied into a chroot.

Support running pex_binary targets directly on the in-repo sources. #15689

Support running pex_binary targets directly on the in-repo sources. #15689

Conversation

benjyw commented May 27, 2022

benjyw commented May 27, 2022

thejcannon commented May 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

thejcannon commented May 31, 2022

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

benjyw commented May 31, 2022

thejcannon commented May 31, 2022

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thejcannon commented Jun 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Jun 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Jun 2, 2022 •

edited

Loading