Add support for cross-building Python applications to `test`, `package` inside-of/for a Docker image #13682

ryanking · 2021-11-19T18:29:26Z

Is your feature request related to a problem? Please describe.

My problem is that we have a good chunk of python code which depends on a lot of system libraries and command line tools being on the PATH. This is bioinformatics code for which this pattern is quite common.

This code will run in a container, whose image is built by pants, in CI and production. When running tests in development environments I would like to be able to use the same environment.

Describe the solution you'd like

I would like to be able to tell pants– "when running this target, run it in a container from this image". Ideally that image can be built by pants automatically when needed.

Describe alternatives you've considered

One alternative we are considering is to move all our development work to linux machines and/or docker containers where we have everything installed. (We may do this regardless of this change).

Additional context

slack discussion - https://pantsbuild.slack.com/archives/C046T6T9U/p1637272879490300

Notes from #14145:

Cross-building Python dists (wheels) to target a different platform could be accomplished internally to Pants using Docker, to allow both cross-platform PEXes and Docker images to be built without pre-built dists.

Cross-building inside of a docker image is very similar to remote execution to a different plaform (as described in #11148): at a fundamental level, it's a bit like executing all logic remotely inside of the container. Similarly, it involves isolating the subgraph in which the cross-build is occuring using a different Platform/Environment.

The text was updated successfully, but these errors were encountered:

kaos · 2021-11-22T10:21:47Z

I'm sketching an idea, see if that could be in the right direction..

Example BUILD file:

python_tests(name="tests")
docker_image(name="base-tools")

docker_test_image(name="test-image", dependencies=[":tests"], instructions=["FROM :base-tools"])

This new docker_test_image would come with the semantics, that it would add COPY .. instructions for each test dependency, and know how to run the resulting image, just as if it was run in a normal sandbox.

Seen here is also a new unimplemented feature, to reference another image by it's target address in the FROM instruction.

The purpose of the base-tools image is to create the image used for hosting the application code, and to run tests, where all the required libraries and cli tools are available.

Does this make sense?

ryanking · 2021-11-22T19:37:53Z

My original though was to avoid adding a new goal type and invert the logic a bit. Something like–

docker_image(name="base-tools")
python_tests(
  name="tests",
  run_in_container = ":base-tools",
)

Eric-Arellano · 2021-12-01T23:52:20Z

I do like this living in python_test and friends. It keeps config near where it's relevant.

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

meganvw · 2021-12-08T06:30:23Z

+1 I have the same use case... my current alternative is to build the target docker image and then run
docker run ... docker_image:latest './pants test <target>'
with the local directory volume mounted but I lose test caching inside the temporary container

having a supported target for this would be great and I think very useful!

kaos · 2021-12-08T06:41:38Z

@meganvw Thanks for letting us know. It helps knowing what features are valuable to implement.

charlietsai · 2022-01-17T09:22:03Z

my current alternative is to build the target docker image and then run docker run ... docker_image:latest './pants test <target>' with the local directory volume mounted but I lose test caching inside the temporary container

I've been successfully running ./pants in a container and mounting the cache. For a single container, you just have to mount the ~/.cache/pants directory.

For multiple concurrently running containers I also had to add the --pid=host setting. I believe this is because the underlying LMDB relies on unique PIDs across containers? Without it, you will encounter Exception: Failed to begin read transaction: Resource temporarily unavailable, which seems to come from LMDB.

ptrhck · 2022-05-29T16:10:40Z

I think this would also address this issue #15557. Is this feature on the roadmap and to be expected shortly?

kaos · 2022-05-30T14:15:33Z

Hi @ptrhck,

It is not to be expected shortly. I've done some preliminary proof of concept work to try stuff out, but it is not actively being worked on by me, and I am unaware of anyone else working on this either.

Eric-Arellano · 2022-06-01T15:45:19Z

@ptrhck that being said, it's an open source project and we LOVE contributions from non-maintainers. We would be very happy to help you or anyone else if you're interested in implementing this!

stuhood · 2022-08-25T17:22:46Z

A few folks from Toolchain will be starting on this in the next few days.

Here is a preliminary design doc: Native Container Support, but there will be independent design for some of the components.

We'd love to collaborate with anyone else who is interested in seeing this feature come to fruition.

chris-stetter · 2022-08-26T13:10:17Z

Very much looking forward to this! Maybe https://github.com/dagger/dagger can give some inspiration for design choices. Overall, it adresses similiar issues. As you have stated in the motivation of the design doc, users might have applications they want to test with pants that rely on a lot of system-dependencies or third-party libraries.

This feature will greatly improve portability. That is the idea of dagger as well: It runs on my machine, I can be sure it runs on CI. It only needs access to Docker, or BuildKit to be more specific.

) In order to support differentiating `@rule` subgraphs for different "environments" as described in the design doc for #13682, `@rules` need to be able inject more than one parameter into any particular subgraph. As a concrete example: the `test` `@rules` need to be able to request "a `TestResult` for this `FieldSet` in this `Environment`". This change adds support for multiple parameters to a `Get`, such that that use case would be spelled out as: ```python test_result = await Get(TestResult, {field_set: FieldSet, environment: Environment}) ``` Fixes #7490. [ci skip-build-wheels]

Part of #13682. This target serves two purposes: 1. Define what image to use, which we will pass to the new Docker `CommandRunner` when setting up a `Process`. 2. Configure the env vars and search paths to use. A follow up will allow individual targets to specify which `environment` to use. If set to a `docker_environment` target, then we will propagate the image name to `Process`. [ci skip-rust] [ci skip-build-wheels]

…lementers (#16717) As described in #12934, each `@union` `Get` callsite currently implicitly represents an API where exactly the parameters provided to the `Get` are available, and no others. In order for other types to be made available to plugins (most immediately, a new `EnvironmentName` type to differentiate subgraphs for different environments in #13682), we need a new syntax to indicate that `@union`s provide additional types as part of their API. To that end, this change allows a `@union` declaration to specify additional types that will be in scope for callees (and consequently must be available to callers): ```python # Declares that callsites using a `Vehicle` must have `Fuel` in scope, and that `Fuel` will be # in scope for callees/implementers of the `@union`. @union(in_scope_types=[Fuel]) class Vehicle(ABC): ... ``` Fixes #12934. [ci skip-build-wheels]

In order to differentiate subgraphs of the `@rule` graph for different platforms and environments for #7735 (to support #13682), the environment that is in use needs to become a very widespread parameter in the `@rule` graph. To do that, this change: 1. Adjusts all production `QueryRule`s to declare that an `EnvironmentName` must be provided. 2. Adjusts most `@union`s to use #16717 to declare that an `EnvironmentName` will be available. 3. Computes the `Platform` from the `EnvironmentName` (currently as a noop, since we don't actually have multiple instances of the `EnvironmentName`). A caveat for point 1 is that to avoid churn, we only update _production_ `QueryRule`s in this change. For continuity in tests, `RuleRunner` will install a singleton `EnvironmentName` by default, and uses a (probably temporary) facility to filter the `EnvironmentName` back out of all `QueryRule`s. Tests that actually want to test with multiple environments should disable that facility.

stuhood · 2022-09-08T18:19:50Z

A status update on this!

As suggested by the linked issues/pulls above:

the rule graph changes necessary to support using multiple environments in a single build have landed (Add support for passing multiple input params to a @rule Get #16668, Explicitly specified protocols for plugins #12934, Allow @unions to declare additional types which are provided to implementers #16717)
an initial version of environment configuration is in place (Add docker_environment target #16752, The environment to use becomes a context-specific parameter #16721)
the first portion of a docker CommandRunner has landed ("naive" docker command runner #16670, docker_image comes from Process struct, rather than the CommandRunner #16758, "switched" command runner #16792, etc)

Our next steps will be to:

Re-evaluate how options are marked environment-specific to minimize churn and new concepts in the migration
Add caching to the docker CommandRunner to re-use containers rather than starting them from scratch
Begin introducing environment fields, to allow relevant targets to set their environment

stuhood · 2022-09-13T19:10:30Z

Note: I've removed run from the title here to capture the fact that a first version will not support "interactive" processes (run, test --debug, repl, etc), but will support standard cached/isolated processes (test, package, etc).

Part of #7735 and #13682. This allows individual test targets to choose which environment is used during their run, including by using a `docker_environment`. The entire subgraph will be updated appropriately, e.g. we will rediscover `BashBinary` and rebuild the PEXes needed, if relevant. [ci skip-rust]

This solves several problems we had with remote execution: 1. You can now specify different env vars than localhost, #7735 2. You can now mix remote execution with Docker execution from #13682 3. You can have multiple configurations for the remote execution server, which is particularly useful so you can specify different Docker images to run in remote execution. If you are not yet using the environments mechanism, then remote execution continues to be toggled via the `--remote-execution` option. Otherwise, remote execution is only used when a remote execution environment target is used. Follow up work will re-evaluate this by leveraging "environment matchers". See #16907 for more info. A follow up will add the `extra_platform_properties` field (#16908).

stuhood · 2022-09-16T22:57:20Z

Another update here.

This week:

docker containers are now cached and reused
A preliminary implementation of making options environment-specific has landed.
python_tests targets gained an _environment field which allows them to refer to an explicit local_environment, docker_environment, or remote_environment target.
- With appropriate configuration, basic tests can now run inside of docker!

Next week:

How an environment= field on a target is "matched" will be made fuzzier, to allow for preferring certain environments where possible, and falling back to other environments where not.
A portion of options will be marked environment-sensitive, and whack-a-moled to allow for more complicated tests to be opted in.

Stretch for next week is beginning to tackle additional use cases from the design doc, including adding environment awareness to the package goal, and to more language backends.

Thanks to @Eric-Arellano, @tdyas, and @chrisjrn for their work on this!

stuhood · 2022-10-27T23:13:57Z

With #17129 and #17096 landing ~tomorrow, this issue will be completed, and ready for (preview) usage in the 2.15.x release.

See #17096 for the documentation for the Environments feature, which is what enables running some or all tests in a docker_environment (or a remote_environment!).

Adds initial documentation for the Environments feature designed and implemented in #7735 and #13682. This initial version is meant to capture what will be possible in the `2.15.x` release, but it will/should contain some references to future work (some of which should block stabilizing environments). Fixes #13682, and fixes #16393. [ci skip-rust] [ci skip-build-wheels]

aviau · 2024-05-09T21:56:14Z

We just got

IntrinsicError: Couldn't find file contents for "BUILD": Failed to begin read transaction: Resource temporarily unavailable

And it looks like --pid=host as suggested in #13682 (comment) fixes it for us. Just commenting here so that this has better SEO (my message mentions the IntrinsicError)

ryanking added the enhancement label Nov 19, 2021

stuhood changed the title ~~ability to run tests in a docker container~~ Add support for cross-building Python applications to test, run, package inside-of/for a Docker image Jan 18, 2022

stuhood mentioned this issue Jan 18, 2022

Add support for cross-building Python applications to build a PEX/Docker-image for a different platform #14145

Closed

stuhood added the 2022-idea label Jan 19, 2022

stuhood mentioned this issue Mar 16, 2022

Platform/Environment-specific subsystems #7735

Closed

kaos mentioned this issue Jun 7, 2022

Container not re-used on code change pantsbuild/example-docker#6

Open

ptrhck mentioned this issue Jun 24, 2022

Code reloading for ./pants run src/docker/hw/Dockerfile through mounts #15925

Open

Eric-Arellano mentioned this issue Aug 3, 2022

Better explain in Docker docs gotcha with platforms and pex_binary #16393

Closed

stuhood removed the 2022-idea label Aug 19, 2022

thejcannon added backend: Python Python backend-related issues backend: Docker Docker backend-related issues labels Aug 21, 2022

stuhood mentioned this issue Aug 24, 2022

allow specifying multiple Params to a Get() #7490

Closed

stuhood assigned stuhood, chrisjrn, tdyas and Eric-Arellano Aug 25, 2022

stuhood mentioned this issue Aug 26, 2022

Add support for passing multiple input params to a @rule Get #16668

Merged

Eric-Arellano mentioned this issue Sep 1, 2022

Add docker_environment target #16752

Merged

Eric-Arellano mentioned this issue Sep 1, 2022

Env var expansion must use the Docker container #16754

Closed

stuhood mentioned this issue Sep 2, 2022

Allow @unions to declare additional types which are provided to implementers #16717

Merged

Eric-Arellano mentioned this issue Sep 2, 2022

Pants disregards coverage-py 'omit' option when 'global_report' is enabled in pants.toml #16760

Open

stuhood mentioned this issue Sep 2, 2022

The environment to use becomes a context-specific parameter #16721

Merged

Eric-Arellano mentioned this issue Sep 8, 2022

Add _environment field to python_test #16795

Merged

stuhood removed their assignment Sep 8, 2022

This was referenced Sep 8, 2022

Add an intrinsic for capturing Snapshots from absolute paths #10842

Open

Triage and improve remote cache hit rate for desktop usage #12203

Open

stuhood changed the title ~~Add support for cross-building Python applications to test, run, package inside-of/for a Docker image~~ Add support for cross-building Python applications to test, package inside-of/for a Docker image Sep 13, 2022

This was referenced Sep 14, 2022

New mechanism for marking process outputs as platform agnostic #16873

Open

Add _remote_environment target for remote execution #16906

Merged

stuhood self-assigned this Sep 29, 2022

This was referenced Oct 4, 2022

Enable crossplatform speculation of requests and their inputs. #8353

Open

Initial documentation for Environments #17096

Merged

Require goals to explicitly locate/choose an environment, and pin graph calculations to __local__ #17129

Closed

stuhood added this to the 2.15.x milestone Oct 7, 2022

stuhood mentioned this issue Oct 10, 2022

Add support for running InteractiveProcesses in Docker #17182

Open

stuhood closed this as completed in #17096 Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cross-building Python applications to `test`, `package` inside-of/for a Docker image #13682

Add support for cross-building Python applications to `test`, `package` inside-of/for a Docker image #13682

ryanking commented Nov 19, 2021 •

edited by stuhood

Loading

kaos commented Nov 22, 2021 •

edited

Loading

ryanking commented Nov 22, 2021

Eric-Arellano commented Dec 1, 2021

meganvw commented Dec 8, 2021

kaos commented Dec 8, 2021

charlietsai commented Jan 17, 2022

ptrhck commented May 29, 2022

kaos commented May 30, 2022

Eric-Arellano commented Jun 1, 2022

stuhood commented Aug 25, 2022 •

edited

Loading

chris-stetter commented Aug 26, 2022

stuhood commented Sep 8, 2022 •

edited

Loading

stuhood commented Sep 13, 2022 •

edited

Loading

stuhood commented Sep 16, 2022

stuhood commented Oct 27, 2022

aviau commented May 9, 2024

Add support for cross-building Python applications to test, package inside-of/for a Docker image #13682

Add support for cross-building Python applications to test, package inside-of/for a Docker image #13682

Comments

ryanking commented Nov 19, 2021 • edited by stuhood Loading

kaos commented Nov 22, 2021 • edited Loading

ryanking commented Nov 22, 2021

Eric-Arellano commented Dec 1, 2021

meganvw commented Dec 8, 2021

kaos commented Dec 8, 2021

charlietsai commented Jan 17, 2022

ptrhck commented May 29, 2022

kaos commented May 30, 2022

Eric-Arellano commented Jun 1, 2022

stuhood commented Aug 25, 2022 • edited Loading

chris-stetter commented Aug 26, 2022

stuhood commented Sep 8, 2022 • edited Loading

stuhood commented Sep 13, 2022 • edited Loading

stuhood commented Sep 16, 2022

stuhood commented Oct 27, 2022

aviau commented May 9, 2024

Add support for cross-building Python applications to `test`, `package` inside-of/for a Docker image #13682

Add support for cross-building Python applications to `test`, `package` inside-of/for a Docker image #13682

ryanking commented Nov 19, 2021 •

edited by stuhood

Loading

kaos commented Nov 22, 2021 •

edited

Loading

stuhood commented Aug 25, 2022 •

edited

Loading

stuhood commented Sep 8, 2022 •

edited

Loading

stuhood commented Sep 13, 2022 •

edited

Loading