-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable running multiple processes in the same sandbox. #17184
Comments
(And my approach shouldn't be upstreamed (IMO) because it doesn't have very good UX in that it also has to merge the descriptions. OTOH, perfect is the enemy of good enough, or something like that.) |
FWIW symlink support is coming down the line (no eta). Additionally for this use-case we'd usually advocate for just using a shell script or similar as your process. |
I'm using a shell-script that's auto-generated from a bunch of processes - noticed the link had died, updated. I think the downside is that it also requires merging the descriptions etc. And for this very specific application; it'd be wasteful to have separate sandboxes since it'd just duplicate the docker image as separate files. |
Is this just a performance issue, then? On the horizon we're playing with symlinks for immutable (large, by some definition) inputs. Otherwise, your request is tricky because pants is written with support for remote execution, which the remote execution API AFAICT doesn't support this feature. |
See #17282 |
Ok had a think; so I think there's three issues here from my perspective.
My workaround solves points 1 and 2 but the UX isn't optimal for 3 since I have to merge the descriptions; thus making it harder to understand what is being done and how long each step takes. Same thing happens in my build step fwiw; I want to:
Each one being an invocation of umoci. Neither step producing a "complete" item. |
cc @chrisjrn as this may relate to some of the work on experimental_shell_command |
I'm not completely sold on there being a need for this beyond writing a shell script and using |
I don't think |
Had another think and wanted to elaborate a bit further because I glanced over the question regarding my use-case. I think there are two distinct cases where this can be used, one where it is an anti-pattern, and one where it isn't. So, I think I'm incorrectly using this to build-and-configure an image in one go. Like here: compile_result = await Get(
FallibleProcessResult,
FusedProcess(
(
Process(
(
umoci.exe,
"raw",
"add-layer",
...
"layers/image_bundle.tar",
),
input_digest=command_digest,
description=f"Package OCI Image Bundle: {request.target.address}",
),
Process(
(
umoci.exe,
"config",
...
),
input_digest=command_digest,
description=f"Configure OCI Image: {request.target.address}",
output_directories=("build",),
),
),
),
) Here I'm adding a layer to an image (process 1) and then configuring that "atomically" (process 2). This means that if I change the environment I'll have to rebuild the contents of that layer. Bad! I can split that into two distinct steps and get much faster configure times, at the cost of extra complexity and potentially increased cache use. Important consideration, but I think my current choice is suboptimal. So; case two: return await Get(
Process,
FusedProcess(
(
packed_image_process,
mkdir_process,
Process(
command,
description=f"Running {request.target}",
input_digest=tool.digest,
),
)
),
) Here I would say I'm using it properly - this is the case where the symlinks break if I cache it, but fundamentally the "unpacked" image is a dead-end build-wise. Caching the unpacked is going to waste space for almost no benefit, and the only thing that can be done here is to run it. Maybe we can run it with different args and keep it unpacked; but I don't see much benefit to that. (Let's ignore my usage of a mkdir_process, which should clearly be a CreateDigest, I've learned. ;-)) So maybe what I'm thinking about is that there's 1..N steps that sometimes have to happen "to get ready" and "finish up", just like I make coffee before I start working in the morning, and put away my cup at the end of the day. The coffee is just an aside to the coding goal - it's only important to me. |
One workaround is to create a bash script that invokes both processes, then invoke your bash script. For example, we use a bash script to invoke Go Processes: pants/src/python/pants/backend/go/util_rules/sdk.py Lines 72 to 135 in 2449ff4
|
Is your feature request related to a problem? Please describe.
To run an OCI image locally with runc one needs to unpack the rootfs onto the local file system, and then execute runc inside that fs. Since this filesystem contains symlinks and so on, pants barfs trying to output it from a process. It's also wasteful, since the fs is quite large and only used during the run it makes more sense to use unpack + run into a single sandbox with no outputs.
However; there's no built-in abstraction for this in Pants right now.
Describe the solution you'd like
I'd like a built-in mechanic for running multiple processes with a fused input/output. I implemented a prototype to solve my immediate problem here:
https://github.com/tgolsson/pants-backends/blob/main/pants-plugins/oci/pants_backend_oci/tools/process.py
Describe alternatives you've considered
I can keep my variant, just seems like a useful building block for Pants to have "natively". As discussed in the below Slack thread one could also run this as multiple processes when/if absolute symlinks are supported, but it'd waste cache space.
Additional context
Discussion where I first hit this issue: https://pantsbuild.slack.com/archives/C01CQHVDMMW/p1665338251230869
The text was updated successfully, but these errors were encountered: