Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cross-building Python applications to test, package inside-of/for a Docker image #13682

Closed
ryanking opened this issue Nov 19, 2021 · 16 comments · Fixed by #17096
Assignees
Labels
backend: Docker Docker backend-related issues backend: Python Python backend-related issues enhancement
Milestone

Comments

@ryanking
Copy link
Contributor

ryanking commented Nov 19, 2021

Is your feature request related to a problem? Please describe.

My problem is that we have a good chunk of python code which depends on a lot of system libraries and command line tools being on the PATH. This is bioinformatics code for which this pattern is quite common.

This code will run in a container, whose image is built by pants, in CI and production. When running tests in development environments I would like to be able to use the same environment.

Describe the solution you'd like

I would like to be able to tell pants– "when running this target, run it in a container from this image". Ideally that image can be built by pants automatically when needed.

Describe alternatives you've considered

One alternative we are considering is to move all our development work to linux machines and/or docker containers where we have everything installed. (We may do this regardless of this change).

Additional context

slack discussion - https://pantsbuild.slack.com/archives/C046T6T9U/p1637272879490300


Notes from #14145:

Cross-building Python dists (wheels) to target a different platform could be accomplished internally to Pants using Docker, to allow both cross-platform PEXes and Docker images to be built without pre-built dists.

Cross-building inside of a docker image is very similar to remote execution to a different plaform (as described in #11148): at a fundamental level, it's a bit like executing all logic remotely inside of the container. Similarly, it involves isolating the subgraph in which the cross-build is occuring using a different Platform/Environment.

@kaos
Copy link
Member

kaos commented Nov 22, 2021

I'm sketching an idea, see if that could be in the right direction..

Example BUILD file:

python_tests(name="tests")
docker_image(name="base-tools")

docker_test_image(name="test-image", dependencies=[":tests"], instructions=["FROM :base-tools"])

This new docker_test_image would come with the semantics, that it would add COPY .. instructions for each test dependency, and know how to run the resulting image, just as if it was run in a normal sandbox.

Seen here is also a new unimplemented feature, to reference another image by it's target address in the FROM instruction.

The purpose of the base-tools image is to create the image used for hosting the application code, and to run tests, where all the required libraries and cli tools are available.

Does this make sense?

@ryanking
Copy link
Contributor Author

My original though was to avoid adding a new goal type and invert the logic a bit. Something like–

docker_image(name="base-tools")
python_tests(
  name="tests",
  run_in_container = ":base-tools",
)

@Eric-Arellano
Copy link
Contributor

I do like this living in python_test and friends. It keeps config near where it's relevant.

kaos added a commit to kaos/pants that referenced this issue Dec 7, 2021
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@meganvw
Copy link

meganvw commented Dec 8, 2021

+1 I have the same use case... my current alternative is to build the target docker image and then run
docker run ... docker_image:latest './pants test <target>'
with the local directory volume mounted but I lose test caching inside the temporary container

having a supported target for this would be great and I think very useful!

@kaos
Copy link
Member

kaos commented Dec 8, 2021

@meganvw Thanks for letting us know. It helps knowing what features are valuable to implement.

@charlietsai
Copy link

my current alternative is to build the target docker image and then run docker run ... docker_image:latest './pants test <target>' with the local directory volume mounted but I lose test caching inside the temporary container

I've been successfully running ./pants in a container and mounting the cache. For a single container, you just have to mount the ~/.cache/pants directory.

For multiple concurrently running containers I also had to add the --pid=host setting. I believe this is because the underlying LMDB relies on unique PIDs across containers? Without it, you will encounter Exception: Failed to begin read transaction: Resource temporarily unavailable, which seems to come from LMDB.

@stuhood stuhood changed the title ability to run tests in a docker container Add support for cross-building Python applications to test, run, package inside-of/for a Docker image Jan 18, 2022
@ptrhck
Copy link

ptrhck commented May 29, 2022

I think this would also address this issue #15557. Is this feature on the roadmap and to be expected shortly?

@kaos
Copy link
Member

kaos commented May 30, 2022

Hi @ptrhck,

It is not to be expected shortly. I've done some preliminary proof of concept work to try stuff out, but it is not actively being worked on by me, and I am unaware of anyone else working on this either.

@Eric-Arellano
Copy link
Contributor

@ptrhck that being said, it's an open source project and we LOVE contributions from non-maintainers. We would be very happy to help you or anyone else if you're interested in implementing this!

@stuhood
Copy link
Member

stuhood commented Aug 25, 2022

A few folks from Toolchain will be starting on this in the next few days.

Here is a preliminary design doc: Native Container Support, but there will be independent design for some of the components.

We'd love to collaborate with anyone else who is interested in seeing this feature come to fruition.

@chris-stetter
Copy link

Very much looking forward to this! Maybe https://github.com/dagger/dagger can give some inspiration for design choices. Overall, it adresses similiar issues. As you have stated in the motivation of the design doc, users might have applications they want to test with pants that rely on a lot of system-dependencies or third-party libraries.

This feature will greatly improve portability. That is the idea of dagger as well: It runs on my machine, I can be sure it runs on CI. It only needs access to Docker, or BuildKit to be more specific.

stuhood added a commit that referenced this issue Aug 26, 2022
)

In order to support differentiating `@rule` subgraphs for different "environments" as described in the design doc for #13682, `@rules` need to be able inject more than one parameter into any particular subgraph.

As a concrete example: the `test` `@rules` need to be able to request "a `TestResult` for this `FieldSet` in this `Environment`". This change adds support for multiple parameters to a `Get`, such that that use case would be spelled out as:
```python
test_result = await Get(TestResult, {field_set: FieldSet, environment: Environment})
``` 

Fixes #7490.

[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Sep 1, 2022
Part of #13682. 

This target serves two purposes:
1. Define what image to use, which we will pass to the new Docker `CommandRunner` when setting up a `Process`.
2. Configure the env vars and search paths to use.

A follow up will allow individual targets to specify which `environment` to use. If set to a `docker_environment` target, then we will propagate the image name to `Process`.

[ci skip-rust]
[ci skip-build-wheels]
stuhood added a commit that referenced this issue Sep 2, 2022
…lementers (#16717)

As described in #12934, each `@union` `Get` callsite currently implicitly represents an API where exactly the parameters provided to the `Get` are available, and no others.

In order for other types to be made available to plugins (most immediately, a new `EnvironmentName` type to differentiate subgraphs for different environments in #13682), we need a new syntax to indicate that `@union`s provide additional types as part of their API.

To that end, this change allows a `@union` declaration to specify additional types that will be in scope for callees (and consequently must be available to callers):
```python
# Declares that callsites using a `Vehicle` must have `Fuel` in scope, and that `Fuel` will be
# in scope for callees/implementers of the `@union`.
@union(in_scope_types=[Fuel])
class Vehicle(ABC):
    ...
```

Fixes #12934.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Sep 3, 2022
In order to differentiate subgraphs of the `@rule` graph for different platforms and environments for #7735 (to support #13682), the environment that is in use needs to become a very widespread parameter in the `@rule` graph.

To do that, this change:
1. Adjusts all production `QueryRule`s to declare that an `EnvironmentName` must be provided.
2. Adjusts most `@union`s to use  #16717 to declare that an `EnvironmentName` will be available.
3. Computes the `Platform` from the `EnvironmentName` (currently as a noop, since we don't actually have multiple instances of the `EnvironmentName`).

A caveat for point 1 is that to avoid churn, we only update _production_ `QueryRule`s in this change. For continuity in tests, `RuleRunner` will install a singleton `EnvironmentName` by default, and uses a (probably temporary) facility to filter the `EnvironmentName` back out of all `QueryRule`s. Tests that actually want to test with multiple environments should disable that facility.
@stuhood
Copy link
Member

stuhood commented Sep 8, 2022

A status update on this!

As suggested by the linked issues/pulls above:

  1. the rule graph changes necessary to support using multiple environments in a single build have landed (Add support for passing multiple input params to a @rule Get #16668, Explicitly specified protocols for plugins #12934, Allow @unions to declare additional types which are provided to implementers #16717)
  2. an initial version of environment configuration is in place (Add docker_environment target #16752, The environment to use becomes a context-specific parameter #16721)
  3. the first portion of a docker CommandRunner has landed ("naive" docker command runner #16670, docker_image comes from Process struct, rather than the CommandRunner #16758, "switched" command runner #16792, etc)

Our next steps will be to:

  • Re-evaluate how options are marked environment-specific to minimize churn and new concepts in the migration
  • Add caching to the docker CommandRunner to re-use containers rather than starting them from scratch
  • Begin introducing environment fields, to allow relevant targets to set their environment

@stuhood stuhood removed their assignment Sep 8, 2022
@stuhood stuhood changed the title Add support for cross-building Python applications to test, run, package inside-of/for a Docker image Add support for cross-building Python applications to test, package inside-of/for a Docker image Sep 13, 2022
@stuhood
Copy link
Member

stuhood commented Sep 13, 2022

Note: I've removed run from the title here to capture the fact that a first version will not support "interactive" processes (run, test --debug, repl, etc), but will support standard cached/isolated processes (test, package, etc).

Eric-Arellano added a commit that referenced this issue Sep 13, 2022
Part of #7735 and #13682.

This allows individual test targets to choose which environment is used during their run, including by using a `docker_environment`. The entire subgraph will be updated appropriately, e.g. we will rediscover `BashBinary` and rebuild the PEXes needed, if relevant.

[ci skip-rust]
Eric-Arellano added a commit that referenced this issue Sep 16, 2022
This solves several problems we had with remote execution:

1. You can now specify different env vars than localhost, #7735
2. You can now mix remote execution with Docker execution from #13682
3. You can have multiple configurations for the remote execution server, which is particularly useful so you can specify different Docker images to run in remote execution.

If you are not yet using the environments mechanism, then remote execution continues to be toggled via the `--remote-execution` option. Otherwise, remote execution is only used when a remote execution environment target is used. Follow up work will re-evaluate this by leveraging "environment matchers".  See #16907 for more info.

A follow up will add the `extra_platform_properties` field (#16908).
@stuhood
Copy link
Member

stuhood commented Sep 16, 2022

Another update here.

This week:

  • docker containers are now cached and reused
  • A preliminary implementation of making options environment-specific has landed.
  • python_tests targets gained an _environment field which allows them to refer to an explicit local_environment, docker_environment, or remote_environment target.
    • With appropriate configuration, basic tests can now run inside of docker!

Next week:

  • How an environment= field on a target is "matched" will be made fuzzier, to allow for preferring certain environments where possible, and falling back to other environments where not.
  • A portion of options will be marked environment-sensitive, and whack-a-moled to allow for more complicated tests to be opted in.

Stretch for next week is beginning to tackle additional use cases from the design doc, including adding environment awareness to the package goal, and to more language backends.

Thanks to @Eric-Arellano, @tdyas, and @chrisjrn for their work on this!

@stuhood
Copy link
Member

stuhood commented Oct 27, 2022

With #17129 and #17096 landing ~tomorrow, this issue will be completed, and ready for (preview) usage in the 2.15.x release.

See #17096 for the documentation for the Environments feature, which is what enables running some or all tests in a docker_environment (or a remote_environment!).

stuhood added a commit that referenced this issue Oct 28, 2022
Adds initial documentation for the Environments feature designed and implemented in #7735 and #13682.

This initial version is meant to capture what will be possible in the `2.15.x` release, but it will/should contain some references to future work (some of which should block stabilizing environments).

Fixes #13682, and fixes #16393.

[ci skip-rust]
[ci skip-build-wheels]
@aviau
Copy link
Contributor

aviau commented May 9, 2024

We just got

IntrinsicError: Couldn't find file contents for "BUILD": Failed to begin read transaction: Resource temporarily unavailable

And it looks like --pid=host as suggested in #13682 (comment) fixes it for us. Just commenting here so that this has better SEO (my message mentions the IntrinsicError)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Docker Docker backend-related issues backend: Python Python backend-related issues enhancement
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.