Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Eric-Arellano · 2021-12-14T22:05:19Z

Status quo

It's useful to run against the same target but with slightly different config. For example, run the same python_test with Python 2 and Python 3 to validate it works with both. This feature is a major benefit of Tox, for example - we know it has value.

In particular, it often makes sense to have permutations here:

interpreter_constraints for python_test targets to check that it's valid for all interpreters
compatible_resolves for java_source to check that each resolve compiles for the target
shell for shunit2_test to check that the script works with multiple Shells (Zsh, Bash, etc)

To do that, you currently have to define two targets:

python_test(name="tests_py2", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests_py3", source="tests.py", interpreter_constraints=["==3.6.*"])

Then, you can tell Pants to either run :tests_py2, :tests_py3, or use globs like a file spec to run over both.

A major downside of this approach is duplication. Another major downside is it results in ambiguity with dependency inference because two targets export the same file.

First-class parametrization

Instead, @benjyw @stuhood and I discussed a syntax like this:

shunit2_test(name="tests", source="tests.sh", shells=["bash", "zsh"])

Which would result in running the same tests one time per shell. Or for interpreter constraints, one test run per "named" interpreter constraint (#12652).

This would require reworking the Target API and Rules API interaction, which is currently very premised on one target -> one "build". For example, FieldSet is uniquely identified by its target's Address and our generic core/goals/test.py is not capable of creating two PytestFieldSets for the same Target.

"Configurations"?

Each target type may have multiple parametrizable fields, e.g. for java_test: compatible_resolves, scala_versions, java_versions. One idea is to have users create a single config that sets all of these values into a single addressable(?) entity. Then the target simply points to which config to use.

(I'm not sure what the benefit of doing this would be?)

The text was updated successfully, but these errors were encountered:

benjyw · 2021-12-14T23:28:32Z

(I'm not sure what the benefit of doing this would be?)

The benefit would be that partitioning logic works without having to know about all the axes it has to partition on.

Alternatively, you'd have to partition according to the cross-product of all parameters, which can have a cardinality much greater than the actual set of used configurations. Especially since in many cases there may be correlations between the parameters (e.g., the interpreter constraint for Python2 is strongly correlated with the resolve for Python2).

stuhood · 2021-12-14T23:57:38Z

In the engine experiment, @jsirois had syntax for selecting configurations from dependees: address@key1=value1,key2=value2. That could also be used here to uniquely identify one configuration of a target for the purposes of running multiple configurations in parallel: i.e.:

✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py2 succeeded.
✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py3 succeeded.

Where the various configurations are actually defined is an open question for me though, and is very related to #13767.

kaos · 2021-12-15T18:58:45Z

Just throwing this out there....

Thinking what it could look like, taking inspiration from what pytest.mark.parametrize does for test functions.

shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))

# above generates the equiv of => 

shunit2_test(name="tests#shell-1", source="tests.sh", shell="bash")
shunit2_test(name="tests#shell-2", source="tests.sh", shell="zsh")

That is, a new parametrize construct, that turns any target into a target generator for the same, but with a certain field replaced for each instance.

And for the python_test case it could look like:

python_test(name="tests", source="tests.py", interpreter_constraints=parametrize(["==2.7.*"], ["==3.6.*"]))

# ==>

python_test(name="tests#interpreter_constraints-1", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests#interpreter_constraints-2", source="tests.py", interpreter_constraints=["==3.6.*"])

Not sure how to support parametrizing on multiple fields, or if the target itself also already is a target generator, etc..

thejcannon · 2021-12-20T15:37:03Z

This matters less for monorepo, internal, company code which usually isn't published. But most published libraries are tested over multiple versions of Python (including multiple versions of Python 3). At my last company we wanted to test across all major supported py3 versions. So this would help simplify that.

However, v2 macros do solve the duplication issue and although macros are the Pants step-child, it raises no further issues other than "yuck macros" 😉 (and all that entails). Users get to control the name which is nice.

I like @stuhood 's proposal of a different naming scheme for this as well as @kaos 's for parameterization. Although just to throw out this wild syntax:

@parameterize("shell", ["bash", "zsh"])
shunit2_test(name="tests", source="tests.sh")

Additionally, also throwing out there naming scheme could mirror pytest's (using square brackets): rc/python/pants/backend/scala/goals/tailor_test.py:tests[interpreter=py2]

tdyas · 2021-12-21T02:33:43Z

Would these parameters include implicit parameters like the execution and target platforms?

I've been playing around in my mind with the concept of a "build context" to make the platform explicit in build rules. An initial use case would be cross-compilation of Linux Go binaries from macOS for use in Docker images (which I saw on Slack a user had tried to do).

I'd love your feedback on how much a "build context" concept overlaps with the parameters idea in this issue.

A few thoughts:

Should there be an explicit BuildContext type in the standard rules which is then a field in every rule type that does builds? BuildContext at the very least would have execution_platform and target_platform fields.
Could the engine support scoped overrides of singletons so that a rule could just call for the current BuildContext as a parameter if it differed from some outer BuildContext? An example use case is a go_binary that has an explicit (yet to be created) target_platform field to set an explicit target platform to build for. (This could actually be a form of multi-type dispatch for Get selectors. E.g., await Get(BuiltPackage, PackageFieldSet, some_field_set, build_context=BuildContext(...).)

tdyas · 2021-12-21T02:36:13Z

Is there a design doc for this issue?

kaos · 2021-12-21T06:29:06Z

I love @tdyas idea with a BuildContext, to gather together relevant information about what to build for.
Currently there are disconnected fields and options hinting at what is required in terms of interpreter constraints (for Python) and platforms.

I think it could make sense to be able to provide these details as a form of "build input", where you can say: I want to to build for "IC pypy>=3.8 and platform x" and have that propagate to all considered targets, where applicable.

Akin to named resolves, this could be with named build contexts, so you can setup ahead of time the set of combinations you want to support, then selecting one or more of them to build for. Also, a target could specify compatible build contexts, so it doesn't build/use something that is known not to work.

Say, a docker_image could perhaps provide something like a compatibile_build_platform="linux-x86-64-cp-38-cp38" or compatbile_named_build_context="build-context-3.8". Then, when building the image, if no build context is explicitly selected, it could select one that is compatible with what is going to be required, for the upstream targets it depends on.

stuhood · 2021-12-21T15:51:22Z

2. Could the engine support scoped overrides of singletons so that a rule could just call for the current BuildContext as a parameter if it differed from some outer BuildContext?

If it will be overridden at some points in the graph, then it shouldn't be a singleton: rather, a Param specified at the root of the graph and overridden in subgraphs. But yea: it's effectively equivalent, and that is what has been proposed for the Platform: see the Multi-Platform Speculation design doc.

But to your first question: "no": I don't think that you need a BuildContext "$deity object" that holds all of the context fields. The engine allows for injection of as many or as few parameters as are actually needed in some scope (by design, precisely for this case), and so if you use one less param then some other subgraph (say: you don't consume the Platform), then you can have a smaller cache key and be platform independent, even though your callers weren't.

I think that the implementation sketch on Multi-Platform Speculation is still up to date, and I put a fair amount of thought into it at the time. Having multiple Params that make up your "build context" in your subgraph requires one (hopefully small) engine feature, which would reduce/remove the need for multi-field Request objects (or something like a BuildContext): #7490.

So two things to consider here:

what the UX of this unification of config is for end users, and how it makes things easier for them (which is really the primary goal of this ticket I think). In that case, having a named context/"configuration" might make sense.
how callers provide parameters to callees in the engine (which could be completely independent from the label syntax, by e.g., eagerly converting labels into typed Params which are then used or not used by callees). Global/named contexts probably won't make sense here.

tdyas · 2021-12-31T07:08:05Z

Another use for parameters: specifying which codegen backend to use: see (4) at #14041

stuhood · 2022-01-05T20:22:11Z

Prior art which is similar to any sort of local parameterized decorator or syntax: bazel's select macro.

But very relevant is that parameterization or select are local-only solutions (see "You can improve on this solution..." here, which motivates bazel's toolchains): I'm not sure how they could easily be applied via #13767.

"Toolchains" to define the bulk of the arguments to a target seem closer to what we want, possibly with a local-only "multiplex across all of these toolchains" syntax (á la parameterization) exclusive to the named/addressed toolchain. But toolchains are quite a bit like a "$deity object" (BuildContext, as @tdyas described it)... exploding the toolchain out into a bunch of component Fields while constructing the multiplexed targets could solve that...?

stuhood · 2022-01-27T21:18:40Z

Here is a draft design for this feature: https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit ... use a pants-devel@ member email for comment access.

Feedback welcome!

…resolves` (#14299) Read the `help` message in `python/setup.py` for an explanation of how this works + what we mean by "disjoint". This wires up `experimental_resolve` to `python_source`, `python_awslambda`, and `python_google_cloud_function`. Thanks to the implementation in `pex_from_targets.py`, that means that they automatically support this new feature. The only target not yet supported is `python_distribution`. Key note: `python_source` has the field `resolve`, rather than what we originally envisioned with `compatible_resolves`. This is because we realized in #13882 and its design doc at https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit# that the way to support a `python_source` target working with multiple resolves is via parametrization, which will create a distinct target for each resolve. However, `python_requirement` still uses `compatible_resolves` (for now at least) because we don't directly run tools like MyPy and Pytest on those. More concretely, they are never the "root" used to determine which resolve to use for a build. [ci skip-rust]

@jsirois

To prepare to make the address syntax more complicated as part of #13882, move to parsing addresses using a generated `peg` parser (using the same parser crate that @jsirois introduced in #11922). Additionally, simplify `Address.spec` and `Address.path_safe_spec` slightly by removing non-trivial early returns and reducing the total number of f-strings. [ci skip-build-wheels]

Adds parameters to addresses in order to support the parameterized targets that are coming in #13882. Although this change could technically be consumed now, until it is used in `Target` construction (coming in the next few days), it is `[internal]` only. [ci skip-build-wheels]

#13882 will require adjusting how many instances of a target are created based on how many times the `parametrize` keyword appears in its fields (including in the `overrides` field, in the case of target generators). The `parametrize` keyword is not itself a valid field value (and would also be challenging to render as such in `peek`), and so target generation must be updated to consume something other than the instantiated target generator to decide which fields to generate with. To prepare for these changes, the base ("`template`") and overridden fields to use in target generation are added to the `GenerateTargetsRequest` class, and made required in `generate_file_level_targets`. To reduce the total amount of churn in callers of `generate_file_level_targets`, this change introduces the `TargetGenerator` and `TargetFilesGenerator` base classes for `Target`. The latter allows file-target generation to be requested declaratively, and both allow specifying which fields should be "moved" and "copied" from the target generator to the generated targets. These moved/copied declarations are used to construct the `template` for generated targets. [ci skip-rust] [ci skip-build-wheels]

As designed in #13882 (and elaborated on in #14420 in the context of Python), the `parametrize` builtin means that targets which want to support multiple resolves (or Scala versions) should use `resolve=parametrize(..)`, so that a different target is created per resolve. This change replaces the `compatible_resolves` field with `resolve` for the JVM. And in order to allow the `resolve` field to be `parametrize`d, it also moves it from the `core_fields` of JVM target generators (which cannot be `parametrize`d) to the `moved_fields`.

…copies of a target (#14408) As introduced in #13882 and designed in https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit#, this change adds the `parametrize` builtin. Examples: ```python # Create two copies of each generated target, with addresses qualified with `@resolve=a` and `@resolve=b`. python_sources( .., resolve=parametrize(“a”, “b”), ) # ...alternatively, define an alias for some or all of the values: python_sources( .., resolve=parametrize(“a”, b="bread and butter"), ) # Create four copies of each generated target, with addresses qualified with both parametrized fields. scala_sources( .., resolve=parametrize(“a”, “b”), jdk=parametrize("c", "d"), ) # Apply parametrization to only some of the generated targets: scala_sources( .., overrides={ "JdkDependent.scala": {"jdk": parametrize("c", "d")}, }, ) ``` ---- `parametrize` may be used in any field (other than `name`) of non-generator targets, but is limited to `moved` fields of generator targets (i.e., those which will be moved from the generator to the generated target). Using `parametrize` in the `overrides` of a target generator requires adjustments to the target generator's implementation (see #14430): this change only adjusts the implementation for target file generators (`TargetFileGenerator`).

stuhood · 2022-02-11T06:59:19Z

Some preliminary docs were added here: https://www.pantsbuild.org/v2.11/docs/targets#parametrizing-targets ... edits very welcome!

I'm out for a few days, but when I get back I'll add a list of open items here from #14408 that block closing this ticket.

kaos · 2022-02-11T07:37:49Z

Can't help but notice that my original idea would actually be valid now (cool 😁 )

shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))

stuhood · 2022-02-17T04:57:48Z

A few issues encountered in pantsbuild/example-jvm#12, which probably block resolving this:

Target parametrization should create one instance of the generator target per parametrization #14515 - An explicit dependency on a parametrized generator (needed between a deploy_jar and a parametrized scala_sources target) results in dependencies on all parametrized instances.
Add support for parametrization in specs parsing #14521 - Prevents copy-pasting or piping of address output.
Target parametrization should default the values of parameters during address expansion #14519 - Prevents use of parametrize with a deploy_jar, since the dependencies field is explicit, and needs to be different per parametrization.

Followup work would be:

scala: create scala_artifact target type #12969 (OR finishing implementing "Configuration") - Both the Scala-version-templated artifact name and the resolve need to be parametrized. "Configuration" would resolve this, but #12969 would also remove the need by templating the version into the artifact.

stuhood · 2022-03-08T22:00:26Z

Have updated the blockers in the above comment. We'll want to close this issue for 2.11.x, since it's important for multiple-resolves.

During design for #13882, we agreed that parameters should come after the generated name for targets, but the initial version landed with the opposite order. This change swaps them to the agreed upon order. [ci skip-build-wheels]

stuhood · 2022-03-28T23:12:41Z

#14521 is the last blocker for 2.11.x: should be able to start it tomorrow. Let's switch to tracking that one.

Thanks to everyone for the feedback and design assistance!

Eric-Arellano added the enhancement label Dec 14, 2021

Eric-Arellano mentioned this issue Dec 14, 2021

jvm: Add resolve field to deploy_jar and JVM test configurations #13369

Closed

stuhood mentioned this issue Dec 16, 2021

[internal] Add [jvm].default_compatible_resolves #13902

Closed

stuhood mentioned this issue Dec 20, 2021

[WIP] add dependency metadata syntax and filtering #13922

Closed

Eric-Arellano mentioned this issue Dec 21, 2021

docker-image does not yet support --target #13934

Closed

This was referenced Jan 3, 2022

[internal] refactor scrooge codegen to allow for multiple languages #14030

Merged

[internal] Python module mapping considers resolves #14034

Merged

stuhood mentioned this issue Jan 6, 2022

JVM: Add controls around memory usage #13067

Closed

stuhood added the estimate: ~2W label Jan 13, 2022

stuhood self-assigned this Jan 20, 2022

This was referenced Jan 24, 2022

Implicitly add a scala-library dependency to scala targets #14171

Closed

jvm: support dependency scopes and typed classpaths #12794

Open

Eric-Arellano mentioned this issue Jan 28, 2022

Update dependency inference for Python & JVM to have visibility by resolves for first-party code #14293

Closed

This was referenced Jan 28, 2022

Make repl useful for Python when there are multiple resolves #14295

Closed

Support multiple disjoint Python resolves via [python].experimental_resolves #14299

Merged

This was referenced Feb 1, 2022

[internal] Better error message for ./pants repl when using multiple resolves #14327

Merged

[internal] Parse addresses using a generated parser #14346

Merged

[internal] Add support for parameterized addresses #14347

Merged

This was referenced Feb 9, 2022

More declarative target generators for plugins #14377

Merged

Add initial support for a parametrize builtin to generate multiple copies of a target #14408

Merged

stuhood mentioned this issue Feb 11, 2022

Replace compatible_resolves with resolve for JVM targets #14441

Merged

Eric-Arellano mentioned this issue Feb 15, 2022

Parametrization does not work with address globs :: and AllTargets #14493

Closed

stuhood mentioned this issue Feb 17, 2022

add support for multiple scala versions pantsbuild/example-jvm#12

Closed

stuhood added this to the 2.11.x milestone Mar 8, 2022

stuhood mentioned this issue Mar 16, 2022

Swap the order of generated names and parameters in Addresses. #14814

Merged

stuhood closed this as completed Mar 28, 2022

stuhood mentioned this issue Mar 29, 2022

Consider adding explicit support for applying build metadata to multiple targets at once #13767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Eric-Arellano commented Dec 14, 2021

benjyw commented Dec 14, 2021

stuhood commented Dec 14, 2021

kaos commented Dec 15, 2021

thejcannon commented Dec 20, 2021

tdyas commented Dec 21, 2021

tdyas commented Dec 21, 2021

kaos commented Dec 21, 2021

stuhood commented Dec 21, 2021 •

edited

Loading

tdyas commented Dec 31, 2021 •

edited

Loading

stuhood commented Jan 5, 2022 •

edited

Loading

stuhood commented Jan 27, 2022

stuhood commented Feb 11, 2022 •

edited

Loading

kaos commented Feb 11, 2022

stuhood commented Feb 17, 2022 •

edited

Loading

stuhood commented Mar 8, 2022

stuhood commented Mar 28, 2022

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Comments

Eric-Arellano commented Dec 14, 2021

Status quo

First-class parametrization

"Configurations"?

benjyw commented Dec 14, 2021

stuhood commented Dec 14, 2021

kaos commented Dec 15, 2021

thejcannon commented Dec 20, 2021

tdyas commented Dec 21, 2021

tdyas commented Dec 21, 2021

kaos commented Dec 21, 2021

stuhood commented Dec 21, 2021 • edited Loading

tdyas commented Dec 31, 2021 • edited Loading

stuhood commented Jan 5, 2022 • edited Loading

stuhood commented Jan 27, 2022

stuhood commented Feb 11, 2022 • edited Loading

kaos commented Feb 11, 2022

stuhood commented Feb 17, 2022 • edited Loading

stuhood commented Mar 8, 2022

stuhood commented Mar 28, 2022

stuhood commented Dec 21, 2021 •

edited

Loading

tdyas commented Dec 31, 2021 •

edited

Loading

stuhood commented Jan 5, 2022 •

edited

Loading

stuhood commented Feb 11, 2022 •

edited

Loading

stuhood commented Feb 17, 2022 •

edited

Loading