Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Closed
Eric-Arellano opened this issue Dec 14, 2021 · 16 comments

Comments

@Eric-Arellano
Copy link
Contributor

Status quo

It's useful to run against the same target but with slightly different config. For example, run the same python_test with Python 2 and Python 3 to validate it works with both. This feature is a major benefit of Tox, for example - we know it has value.

In particular, it often makes sense to have permutations here:

  • interpreter_constraints for python_test targets to check that it's valid for all interpreters
  • compatible_resolves for java_source to check that each resolve compiles for the target
  • shell for shunit2_test to check that the script works with multiple Shells (Zsh, Bash, etc)

To do that, you currently have to define two targets:

python_test(name="tests_py2", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests_py3", source="tests.py", interpreter_constraints=["==3.6.*"])

Then, you can tell Pants to either run :tests_py2, :tests_py3, or use globs like a file spec to run over both.

A major downside of this approach is duplication. Another major downside is it results in ambiguity with dependency inference because two targets export the same file.

First-class parametrization

Instead, @benjyw @stuhood and I discussed a syntax like this:

shunit2_test(name="tests", source="tests.sh", shells=["bash", "zsh"])

Which would result in running the same tests one time per shell. Or for interpreter constraints, one test run per "named" interpreter constraint (#12652).

This would require reworking the Target API and Rules API interaction, which is currently very premised on one target -> one "build". For example, FieldSet is uniquely identified by its target's Address and our generic core/goals/test.py is not capable of creating two PytestFieldSets for the same Target.

"Configurations"?

Each target type may have multiple parametrizable fields, e.g. for java_test: compatible_resolves, scala_versions, java_versions. One idea is to have users create a single config that sets all of these values into a single addressable(?) entity. Then the target simply points to which config to use.

(I'm not sure what the benefit of doing this would be?)

@benjyw
Copy link
Contributor

benjyw commented Dec 14, 2021

(I'm not sure what the benefit of doing this would be?)

The benefit would be that partitioning logic works without having to know about all the axes it has to partition on.

Alternatively, you'd have to partition according to the cross-product of all parameters, which can have a cardinality much greater than the actual set of used configurations. Especially since in many cases there may be correlations between the parameters (e.g., the interpreter constraint for Python2 is strongly correlated with the resolve for Python2).

@stuhood
Copy link
Member

stuhood commented Dec 14, 2021

In the engine experiment, @jsirois had syntax for selecting configurations from dependees: address@key1=value1,key2=value2. That could also be used here to uniquely identify one configuration of a target for the purposes of running multiple configurations in parallel: i.e.:

✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py2 succeeded.
✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py3 succeeded.

Where the various configurations are actually defined is an open question for me though, and is very related to #13767.

@kaos
Copy link
Member

kaos commented Dec 15, 2021

Just throwing this out there....

Thinking what it could look like, taking inspiration from what pytest.mark.parametrize does for test functions.

shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))

# above generates the equiv of => 

shunit2_test(name="tests#shell-1", source="tests.sh", shell="bash")
shunit2_test(name="tests#shell-2", source="tests.sh", shell="zsh")

That is, a new parametrize construct, that turns any target into a target generator for the same, but with a certain field replaced for each instance.

And for the python_test case it could look like:

python_test(name="tests", source="tests.py", interpreter_constraints=parametrize(["==2.7.*"], ["==3.6.*"]))

# ==>

python_test(name="tests#interpreter_constraints-1", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests#interpreter_constraints-2", source="tests.py", interpreter_constraints=["==3.6.*"])

Not sure how to support parametrizing on multiple fields, or if the target itself also already is a target generator, etc..

@thejcannon
Copy link
Member

This matters less for monorepo, internal, company code which usually isn't published. But most published libraries are tested over multiple versions of Python (including multiple versions of Python 3). At my last company we wanted to test across all major supported py3 versions. So this would help simplify that.

However, v2 macros do solve the duplication issue and although macros are the Pants step-child, it raises no further issues other than "yuck macros" 😉 (and all that entails). Users get to control the name which is nice.


I like @stuhood 's proposal of a different naming scheme for this as well as @kaos 's for parameterization. Although just to throw out this wild syntax:

@parameterize("shell", ["bash", "zsh"])
shunit2_test(name="tests", source="tests.sh")

Additionally, also throwing out there naming scheme could mirror pytest's (using square brackets): rc/python/pants/backend/scala/goals/tailor_test.py:tests[interpreter=py2]

@tdyas
Copy link
Contributor

tdyas commented Dec 21, 2021

Would these parameters include implicit parameters like the execution and target platforms?

I've been playing around in my mind with the concept of a "build context" to make the platform explicit in build rules. An initial use case would be cross-compilation of Linux Go binaries from macOS for use in Docker images (which I saw on Slack a user had tried to do).

I'd love your feedback on how much a "build context" concept overlaps with the parameters idea in this issue.

A few thoughts:

  1. Should there be an explicit BuildContext type in the standard rules which is then a field in every rule type that does builds? BuildContext at the very least would have execution_platform and target_platform fields.
  2. Could the engine support scoped overrides of singletons so that a rule could just call for the current BuildContext as a parameter if it differed from some outer BuildContext? An example use case is a go_binary that has an explicit (yet to be created) target_platform field to set an explicit target platform to build for. (This could actually be a form of multi-type dispatch for Get selectors. E.g., await Get(BuiltPackage, PackageFieldSet, some_field_set, build_context=BuildContext(...).)

@tdyas
Copy link
Contributor

tdyas commented Dec 21, 2021

Is there a design doc for this issue?

@kaos
Copy link
Member

kaos commented Dec 21, 2021

I love @tdyas idea with a BuildContext, to gather together relevant information about what to build for.
Currently there are disconnected fields and options hinting at what is required in terms of interpreter constraints (for Python) and platforms.

I think it could make sense to be able to provide these details as a form of "build input", where you can say: I want to to build for "IC pypy>=3.8 and platform x" and have that propagate to all considered targets, where applicable.

Akin to named resolves, this could be with named build contexts, so you can setup ahead of time the set of combinations you want to support, then selecting one or more of them to build for. Also, a target could specify compatible build contexts, so it doesn't build/use something that is known not to work.

Say, a docker_image could perhaps provide something like a compatibile_build_platform="linux-x86-64-cp-38-cp38" or compatbile_named_build_context="build-context-3.8". Then, when building the image, if no build context is explicitly selected, it could select one that is compatible with what is going to be required, for the upstream targets it depends on.

@stuhood
Copy link
Member

stuhood commented Dec 21, 2021

2. Could the engine support scoped overrides of singletons so that a rule could just call for the current BuildContext as a parameter if it differed from some outer BuildContext?

If it will be overridden at some points in the graph, then it shouldn't be a singleton: rather, a Param specified at the root of the graph and overridden in subgraphs. But yea: it's effectively equivalent, and that is what has been proposed for the Platform: see the Multi-Platform Speculation design doc.

But to your first question: "no": I don't think that you need a BuildContext "$deity object" that holds all of the context fields. The engine allows for injection of as many or as few parameters as are actually needed in some scope (by design, precisely for this case), and so if you use one less param then some other subgraph (say: you don't consume the Platform), then you can have a smaller cache key and be platform independent, even though your callers weren't.

I think that the implementation sketch on Multi-Platform Speculation is still up to date, and I put a fair amount of thought into it at the time. Having multiple Params that make up your "build context" in your subgraph requires one (hopefully small) engine feature, which would reduce/remove the need for multi-field Request objects (or something like a BuildContext): #7490.


So two things to consider here:

  1. what the UX of this unification of config is for end users, and how it makes things easier for them (which is really the primary goal of this ticket I think). In that case, having a named context/"configuration" might make sense.
  2. how callers provide parameters to callees in the engine (which could be completely independent from the label syntax, by e.g., eagerly converting labels into typed Params which are then used or not used by callees). Global/named contexts probably won't make sense here.

@tdyas
Copy link
Contributor

tdyas commented Dec 31, 2021

Another use for parameters: specifying which codegen backend to use: see (4) at #14041

@stuhood
Copy link
Member

stuhood commented Jan 5, 2022

Prior art which is similar to any sort of local parameterized decorator or syntax: bazel's select macro.

But very relevant is that parameterization or select are local-only solutions (see "You can improve on this solution..." here, which motivates bazel's toolchains): I'm not sure how they could easily be applied via #13767.


"Toolchains" to define the bulk of the arguments to a target seem closer to what we want, possibly with a local-only "multiplex across all of these toolchains" syntax (á la parameterization) exclusive to the named/addressed toolchain. But toolchains are quite a bit like a "$deity object" (BuildContext, as @tdyas described it)... exploding the toolchain out into a bunch of component Fields while constructing the multiplexed targets could solve that...?

@stuhood
Copy link
Member

stuhood commented Jan 27, 2022

Here is a draft design for this feature: https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit ... use a pants-devel@ member email for comment access.

Feedback welcome!

Eric-Arellano added a commit that referenced this issue Jan 29, 2022
…resolves` (#14299)

Read the `help` message in `python/setup.py` for an explanation of how this works + what we mean by "disjoint". 

This wires up `experimental_resolve` to `python_source`, `python_awslambda`, and `python_google_cloud_function`. Thanks to the implementation in `pex_from_targets.py`, that means that they automatically support this new feature. The only target not yet supported is `python_distribution`.

Key note: `python_source` has the field `resolve`, rather than what we originally envisioned with `compatible_resolves`. This is because we realized in #13882 and its design doc at https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit# that the way to support a `python_source` target working with multiple resolves is via parametrization, which will create a distinct target for each resolve. 

However, `python_requirement` still uses `compatible_resolves` (for now at least) because we don't directly run tools like MyPy and Pytest on those. More concretely, they are never the "root" used to determine which resolve to use for a build.

[ci skip-rust]
stuhood added a commit that referenced this issue Feb 3, 2022
To prepare to make the address syntax more complicated as part of #13882, move to parsing addresses using a generated `peg` parser (using the same parser crate that @jsirois introduced in #11922).

Additionally, simplify `Address.spec` and `Address.path_safe_spec` slightly by removing non-trivial early returns and reducing the total number of f-strings.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Feb 3, 2022
Adds parameters to addresses in order to support the parameterized targets that are coming in #13882.

Although this change could technically be consumed now, until it is used in `Target` construction (coming in the next few days), it is `[internal]` only.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Feb 10, 2022
#13882 will require adjusting how many instances of a target are created based on how many times the `parametrize` keyword appears in its fields (including in the `overrides` field, in the case of target generators). The `parametrize` keyword is not itself a valid field value (and would also be challenging to render as such in `peek`), and so target generation must be updated to consume something other than the instantiated target generator to decide which fields to generate with.

To prepare for these changes, the base ("`template`") and overridden fields to use in target generation are added to the `GenerateTargetsRequest` class, and made required in `generate_file_level_targets`. To reduce the total amount of churn in callers of `generate_file_level_targets`, this change introduces the `TargetGenerator` and `TargetFilesGenerator` base classes for `Target`. The latter allows file-target generation to be requested declaratively, and both allow specifying which fields should be "moved" and "copied" from the target generator to the generated targets. These moved/copied declarations are used to construct the `template` for generated targets.

[ci skip-rust]
[ci skip-build-wheels]
stuhood added a commit that referenced this issue Feb 11, 2022
As designed in #13882 (and elaborated on in #14420 in the context of Python), the `parametrize` builtin means that targets which want to support multiple resolves (or Scala versions) should use `resolve=parametrize(..)`, so that a different target is created per resolve.

This change replaces the `compatible_resolves` field with `resolve` for the JVM. And in order to allow the `resolve` field to be `parametrize`d, it also moves it from the `core_fields` of JVM target generators (which cannot be `parametrize`d) to the `moved_fields`.
stuhood added a commit that referenced this issue Feb 11, 2022
…copies of a target (#14408)

As introduced in #13882 and designed in https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit#, this change adds the `parametrize` builtin.

Examples:
```python
# Create two copies of each generated target, with addresses qualified with `@resolve=a` and `@resolve=b`.
python_sources(
  ..,
  resolve=parametrize(“a”, “b”),
)

# ...alternatively, define an alias for some or all of the values:
python_sources(
  ..,
  resolve=parametrize(“a”, b="bread and butter"),
)

# Create four copies of each generated target, with addresses qualified with both parametrized fields.
scala_sources(
  ..,
  resolve=parametrize(“a”, “b”),
  jdk=parametrize("c", "d"),
)

# Apply parametrization to only some of the generated targets:
scala_sources(
  ..,
  overrides={
    "JdkDependent.scala": {"jdk": parametrize("c", "d")},
  },
)
```

----

`parametrize` may be used in any field (other than `name`) of non-generator targets, but is limited to `moved` fields of generator targets (i.e., those which will be moved from the generator to the generated target).

Using `parametrize` in the `overrides` of a target generator requires adjustments to the target generator's implementation (see #14430): this change only adjusts the implementation for target file generators (`TargetFileGenerator`).
@stuhood
Copy link
Member

stuhood commented Feb 11, 2022

Some preliminary docs were added here: https://www.pantsbuild.org/v2.11/docs/targets#parametrizing-targets ... edits very welcome!

I'm out for a few days, but when I get back I'll add a list of open items here from #14408 that block closing this ticket.

@kaos
Copy link
Member

kaos commented Feb 11, 2022

Can't help but notice that my original idea would actually be valid now (cool 😁 )

shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))

@stuhood
Copy link
Member

stuhood commented Feb 17, 2022

A few issues encountered in pantsbuild/example-jvm#12, which probably block resolving this:

Followup work would be:

  • scala: create scala_artifact target type #12969 (OR finishing implementing "Configuration") - Both the Scala-version-templated artifact name and the resolve need to be parametrized. "Configuration" would resolve this, but #12969 would also remove the need by templating the version into the artifact.

@stuhood
Copy link
Member

stuhood commented Mar 8, 2022

Have updated the blockers in the above comment. We'll want to close this issue for 2.11.x, since it's important for multiple-resolves.

@stuhood stuhood added this to the 2.11.x milestone Mar 8, 2022
stuhood added a commit that referenced this issue Mar 16, 2022
During design for #13882, we agreed that parameters should come after the generated name for targets, but the initial version landed with the opposite order. This change swaps them to the agreed upon order.

[ci skip-build-wheels]
@stuhood
Copy link
Member

stuhood commented Mar 28, 2022

#14521 is the last blocker for 2.11.x: should be able to start it tomorrow. Let's switch to tracking that one.

Thanks to everyone for the feedback and design assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants