Skip to content

Commit

Permalink
Initial draft of environments.md.
Browse files Browse the repository at this point in the history
[ci skip-rust]

[ci skip-build-wheels]
  • Loading branch information
stuhood committed Oct 3, 2022
1 parent 7e94aef commit d25991b
Show file tree
Hide file tree
Showing 7 changed files with 200 additions and 26 deletions.
179 changes: 179 additions & 0 deletions docs/markdown/Using Pants/environments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
title: "Environments: Cross-Platform or Remote Builds"
slug: "environments"
hidden: false
createdAt: "2022-10-03T21:39:51.235Z"
updatedAt: "2022-10-03T21:39:51.235Z"
---
Environments
============

By default, Pants will execute all sandboxed build work directly on localhost. But defining and using additional "environments" for particular targets allows Pants to transparently execute some or all of your build either:
1. locally in Docker (_TODO: link?)_ containers
2. remotely via [remote execution](doc:remote-execution)
3. locally, but with a non-default set of environment variables and settings (such as for cross-building)

## Defining environments

Environments are defined using environment targets:

* [`local_environment`](doc:reference-local_environment) - Runs without containerization on localhost (which is also the default if no environment targets are defined).
* [`docker_environment`](doc:reference-docker_environment) - Runs in a cached container using the specified Docker image.
* [`remote_environment`](doc:reference-remote_environment) - Runs in a remote worker via [remote execution](doc:remote-execution) (possibly with containerization, depending on the server implementation).

Environment targets are given short, descriptive names using the [`[environments.names]` option](doc:reference-environments#names) (usually defined in `pants.toml`) for consuming targets to use to refer from them in `BUILD` files. That might look like a `pants.toml` section and `BUILD` file (at the root of the repository in this case) containing:

```toml
[environments.names]
linux = "//:local_linux"
linux_docker = "//:local_busybox"
```

```python
local_environment(
name="local_linux",
compatible_platforms=["linux_x86_64"],
fallback_environment="local_busybox",
..
)

docker_environment(
name="local_busybox",
platform="linux_x86_64",
image="busybox:latest@sha256-abcd123...",
..
)
```

### Environment-aware options

Environment targets have fields (target arguments) which correspond to options which are marked "environment-aware" (_TODO: align naming_). When an option is environment-aware, the value of the option that will be used in an environment can be overridden by setting the corresponding field value on the environment target for that environment. If an environment target does not set a value, it defaults to the value which is set globally via options values.

For example, the [`[python-bootstrap].search_path` option](doc:reference-python-bootstrap#search_path) is environment-aware, which is indicated in its help (_TODO: ensure these are marked in the generated help_). It can be overridden for a particular environment by a corresponding environment target field, such as [the one on `local_environment`](doc:reference-local_environment#codepython_bootstrap_search_pathcode).

> 👍 See an option which should be environment-aware, but isn't?
>
> Environments are a new concept: if you see an option value which should be marked environment-aware but isn't, please definitely [file an issue](https://github.com/pantsbuild/pants/issues/new/choose)!
## Consuming environments

To declare which environment they should build with, many (_TODO: mark more_) target types (but particularly "root" targets like tests or binaries) have an `environment=` field: for example, [`python_tests(environment=..)`](doc:reference-python_tests#codeenvironmentcode).

The `environment=` field may either:
1. refer to an environment by name
2. use the special `__local__` environment, which resolves to any matching `local_environment` (see "Environment matching" below)

Test targets additionally have a `runtime_environment=` field (_TODO: see workflow below, and implement_) which defaults to the value of the target's `environment=` field, but which can be set explicitly to indicate that a test should execute in a different environment than it was built in. This can be used to enable cross-building (where a test is built on one platform, but executed on another), or to explicitly provide tools or running services at test runtime which would not otherwise be available.

> 🚧 Environment compatibility
>
> Currently, there is no static validation that a target's environment is compatible with its dependencies' environments -- only the implicit validation of the goals that you run successfully against those targets (`check`, `lint`, `test`, `package`, etc).
>
> As we gain more experience with how environments are used in the wild, it's possible that more static validation can be added: your feedback would be very welcome!
### Setting the environment on many targets at once

To use an environment everywhere in your repository (or only within a particular subdirectory, or with a particular target
type), you can use the [`__defaults__` builtin](doc:targets#field-default-values). For example, to use an environment named `my_default_environment` globally by default, you would add the following to a `BUILD` file at the root of the repository:
```python
__defaults__(all=dict(environment="my_default_environment"))
```
... and individual targets could override the default as needed.

### Environment matching

A single environment name may end up referring to different environment targets on different physical machines, or with different global settings applied: this is known as environment "matching".

* `local_environment` targets will match if their `compatible_platforms=` field matches localhost's platform.
* `docker_environment` targets will match [if Docker is enabled](doc:reference-global#docker_execution), and if their `platform=` field is compatible with localhost's platform.
* `remote_environment` targets will match [if Remote execution is enabled](doc:reference-global#remote_execution).

It a particular environment target _doesn't_ match, it can configure a `fallback_environment=` which will be attempted next. This allows for forming preference chains which are referred to by whichever environment name is at the head of the chain.

For example: a chain like "prefer remote execution if enabled, but fall back to local execution if the platform matches, otherwise use docker" might be configured via the targets:
```python
remote_environment(
name="remote",
fallback_environment="local",
..
)

local_environment(
name="local",
compatible_platforms=["linux_x86_64"],
fallback_environment="docker",
)

docker_environment(
name="docker",
..
)
```

In future versions, environment targets will gain additional predicates to control whether they match (for example: `local_environment` will likely gain a predicate that looks for the presence of an environment variable _TODO: open ticket_). But in the meantime, it's possible to override which environments are matched for particular use cases by overriding their configured names: see the "Toggle use of an environment" workflow below for an example.

## Example workflows

### Enabling remote execution globally

`remote_environment` targets match unless the [`--remote-execution`](doc:reference-global#remote_execution) option is disabled. So to cause a particular environment name to use remote execution whenever it is enabled, you could define environment targets which tried remote execution first, and then fell back to local execution:

```python
remote_environment(
name="remote_busybox",
platform="linux_x86_64",
extra_platform_properties={"container-image=busybox:latest"},
fallback_environment="local",
)

local_environment(
name="local",
compatible_platforms=[...],
)
```

You'd then give your `remote_environment` target an unassuming name like "default":
```toml
[environments.names]
default = "//:remote_busybox"
local = "//:local"
```
... and use that environment by default with all targets. Users or consumers like CI could then toggle whether remote execution was used by (un)setting `--remote-execution`.

_TODO: No speculation in `2.15.x`: it is definitely necessary for using remote execution locally, but it might not be necessary for CI use-cases._

### Execute a test in Docker, while natively cross-building it

_TODO: Give an example of using `environment=` vs `runtime_environment=` to use docker only for test execution, but not for building of thirdparty dependencies by using PEX to cross-build. This will require exposing options out of PEX to override (?) the target platform. But figuring out how that doesn't break `check` is an open question._

### Toggle use of an environment for some consumers

As mentioned above in "Environment matching", environment targets "match" based on their field values and global options. But if two environment targets would be ambiguous in some cases, or if you'd otherwise like to control what a particular environment name means (in CI, for example), you can override an environment name via options.

For example: if you'd like to use a particular `macOS` environment target locally, but override it for a particular use case in CI, you'd start by defining two `local_environment` targets which would usually match ambiguously:

```python
local_environment(
name="macos_laptop",
compatible_platforms=["macos_x86_64"],
)

local_environment(
name="macos_ci",
compatible_platforms=["macos_x86_64"],
)
```

... and then assign one of them a (generic) environment name in `pants.toml`:
```toml
[environments.names]
macos = "//:macos_laptop"
...
```

You could then _override_ that name definition in `pants.ci.toml` (note the use of the `.add` suffix, in order to preserve any other named environments):
```toml
[environments.names.add]
macos = "//:macos_ci"
```

2 changes: 1 addition & 1 deletion docs/markdown/Using Pants/remote-caching-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ updatedAt: "2021-03-19T21:39:51.235Z"
Overview
========

Ordinarily, Pants executes processes locally on the system on which it is run and also caches the results of those processes locally as well. Besides this "local execution" mode of operation, Pants also supports two distributed modes of operation:
By default, Pants executes processes in a local [environment](doc:environments) on the system on which it is run, and caches the results of those processes locally as well. Besides this "local execution" mode of operation, Pants also supports two distributed modes of operation:

1. "Remote caching" where Pants store results from local process execution in a remote cache and also consumes results from that remote cache; and

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,25 @@ remote_execution_address = "grpc://build.corp.example.com:8980"
remote_instance_name = "main"
```

### Platform Properties
### Environment-specific settings

The REAPI execution service selects a worker for a process by consulting the "platform properties" that are passed in a remote execution request. These platform properties are key/value pairs that are configured in the server. Generally, you will configure these in the server (or be provided them by your server's administrator), and then configure Pants to use what was configured.
The REAPI execution service selects a worker for a process by consulting the "platform properties" that are passed in a remote execution request. These platform properties are key/value pairs that are configured for particular workers in the server. Generally, you will configure these in the server (or be provided them by your server's administrator), and then configure Pants to match particular workers using their relevant platform properties.

Assume that the REAPI server is configured with `OSFamily=linux` as the only platform properties. Then building on the first example earlier, add the `remote_execution_extra_platform_properties` to `pants.toml`:
To define platform properties (as well as to configure any other settings which are specific to running on a remote worker), you should define a remote environment. Building on the first example earlier, you would add [`remote_environment` targets](doc:reference-remote_environment) (see [environment](doc:environments) for more information) corresponding to each set of distinct workers you want to use in the server. Assuming that the REAPI server is configured with a particular worker type labeled `docker-container=busybox:latest`, that might look like a `BUILD` file containing:

```toml
[GLOBAL]
remote_execution = true
remote_store_address = "grpc://build.corp.example.com:8980"
remote_execution_address = "grpc://build.corp.example.com:8980"
remote_instance_name = "main"
remote_execution_extra_platform_properties = [
"OSFamily=linux",
]
```python
remote_environment(
name="remote_busybox",
platform="linux_x86_64",
extra_platform_properties = [
"docker-container=busybox:latest",
],
..
)
```

Your `remote_environment` will also need to override any [environment-aware options](doc:environments) which configure the relevant tools used in your repository. For example: if building Pythom code, a Python interpreter must be available and matched via the environment-aware options of `[python-bootstrap]`. If using protobuf support, then you may also need `unzip` available in the remote execution environment in order to unpack the protoc archive. Etc.

### Concurrency

Finally, you should configure Pants to limit the number of concurrent execution requests that are sent to the REAPI server. The `process_execution_remote_parallelism` option controls this concurrency. For example, if `process_execution_remote_parallelism` is set to `20`, then Pants will only send a maximum of 20 execution requests at a single moment of time.
Expand Down Expand Up @@ -95,13 +97,6 @@ remote_ca_certs_path = "/etc/ssl/certs/ca-certificates.crt"
Reference
=========

Run `./pants help-advanced global` or refer to [Global options](doc:reference-global). Most remote execution and caching options begin with the prefix `--remote`.

Limitations
===========

The remote execution support in Pants is still experimental and comes with several limitations:

1. The main limitation is that Pants assumes that the remote execution platform is the same as the local platform. Thus, if the remote execution service is running on Linux, then Pants must also be running on Linux in order to successfully submit remote execution requests. This limitation will eventually be fixed, but as of version 2.0.x, Pants still has the limitation.
For global options, run `./pants help-advanced global` or refer to [Global options](doc:reference-global). Most remote execution and caching options begin with the prefix `--remote`.

2. The remote execution environment will need to contain appropriate tooling expected by the Pants subsystems used in your repository. At a minimum, this means a Python interpreter must be available if building Python code. If using protobuf support, then you may also need `unzip` available in the remote execution environment in order to unpack the protoc archive. This documentation is incomplete with regards to what tooling needs to be available.
For environment-specific options, see `./pants help-advanced remote_environment` or the [`remote_environment` target](doc:reference-remote_environment).
2 changes: 1 addition & 1 deletion docs/markdown/Using Pants/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Use the option `--keep-sandboxes=always` for Pants to log the paths to these san

You can also pass `--keep-sandboxes=on_failure`, to preserve only the sandboxes of failing processes.

There is even a `__run.sh` script in the directory that will run the process using the same argv and environment that Pants would use.
There is even a `__run.sh` script in the directory that will run the process using the same argv and environment variables that Pants would use.

Cache or pantsd invalidation issues
-----------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The `ReplRequest ` will get converted into an `InteractiveProcess` that will run

The process will run in a temporary directory in the build root, which means that the script/program can access files that would normally need to be declared by adding a `file` / `files` or `resource` / `resources` target to the `dependencies` field.

The process's environment will not be hermetic, meaning that it will inherit the environment used by the `./pants process`. Any values you set in `extra_env` will add or update the specified environment variables.
The process will not be hermetic, meaning that it will inherit the environment variables used by the `./pants` process. Any values you set in `extra_env` will add or update the specified environment variables.

```python
from dataclasses import dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ The `RunRequest` will get converted into an `InteractiveProcess` that will run i

The process will run in a temporary directory in the build root, which means that the script/program can access files that would normally need to be declared by adding a `files` or `resources` target to the `dependencies` field.

The process's environment will not be hermetic, meaning that it will inherit the environment used by the `./pants process`. Any values you set in `extra_env` will add or update the specified environment variables.
The process will not be hermetic, meaning that it will inherit the environment variables used by the `./pants` process. Any values you set in `extra_env` will add or update the specified environment variables.

```python
from dataclasses import dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ To populate the temporary directory with files, use the parameter `input_digest:

To set environment variables, use the parameter `env: Mapping[str, str]`. `@rules` are prevented from accessing `os.environ` (it will always be empty) because this reduces reproducibility and breaks caching. Instead, either hardcode the value or add a [`Subsystem` option](doc:rules-api-subsystems) for the environment variable in question, or request the `Environment` type in your `@rule`.

The `EnvironmentVars` type contains a subset of the environment that Pants was run in, and is requested via a `EnvironmentVarsRequest` that lists the variables to consume.
The `EnvironmentVars` type contains a subset of the environment variables that Pants was run in, and is requested via a `EnvironmentVarsRequest` that lists the variables to consume.

```python

Expand Down

0 comments on commit d25991b

Please sign in to comment.