-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic support for parametrizing metadata like interpreter constraints and resolves #13882
Comments
The benefit would be that partitioning logic works without having to know about all the axes it has to partition on. Alternatively, you'd have to partition according to the cross-product of all parameters, which can have a cardinality much greater than the actual set of used configurations. Especially since in many cases there may be correlations between the parameters (e.g., the interpreter constraint for Python2 is strongly correlated with the resolve for Python2). |
In the engine experiment, @jsirois had syntax for selecting configurations from dependees: ✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py2 succeeded.
✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py3 succeeded. Where the various configurations are actually defined is an open question for me though, and is very related to #13767. |
Just throwing this out there.... Thinking what it could look like, taking inspiration from what shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))
# above generates the equiv of =>
shunit2_test(name="tests#shell-1", source="tests.sh", shell="bash")
shunit2_test(name="tests#shell-2", source="tests.sh", shell="zsh") That is, a new And for the python_test(name="tests", source="tests.py", interpreter_constraints=parametrize(["==2.7.*"], ["==3.6.*"]))
# ==>
python_test(name="tests#interpreter_constraints-1", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests#interpreter_constraints-2", source="tests.py", interpreter_constraints=["==3.6.*"]) Not sure how to support parametrizing on multiple fields, or if the target itself also already is a target generator, etc.. |
This matters less for monorepo, internal, company code which usually isn't published. But most published libraries are tested over multiple versions of Python (including multiple versions of Python 3). At my last company we wanted to test across all major supported py3 versions. So this would help simplify that. However, v2 macros do solve the duplication issue and although macros are the Pants step-child, it raises no further issues other than "yuck macros" 😉 (and all that entails). Users get to control the I like @stuhood 's proposal of a different naming scheme for this as well as @kaos 's for parameterization. Although just to throw out this wild syntax: @parameterize("shell", ["bash", "zsh"])
shunit2_test(name="tests", source="tests.sh") Additionally, also throwing out there naming scheme could mirror |
Would these parameters include implicit parameters like the execution and target platforms? I've been playing around in my mind with the concept of a "build context" to make the platform explicit in build rules. An initial use case would be cross-compilation of Linux Go binaries from macOS for use in Docker images (which I saw on Slack a user had tried to do). I'd love your feedback on how much a "build context" concept overlaps with the parameters idea in this issue. A few thoughts:
|
Is there a design doc for this issue? |
I love @tdyas idea with a I think it could make sense to be able to provide these details as a form of "build input", where you can say: I want to to build for "IC pypy>=3.8 and platform x" and have that propagate to all considered targets, where applicable. Akin to named resolves, this could be with named build contexts, so you can setup ahead of time the set of combinations you want to support, then selecting one or more of them to build for. Also, a target could specify compatible build contexts, so it doesn't build/use something that is known not to work. Say, a |
If it will be overridden at some points in the graph, then it shouldn't be a singleton: rather, a But to your first question: "no": I don't think that you need a I think that the implementation sketch on Multi-Platform Speculation is still up to date, and I put a fair amount of thought into it at the time. Having multiple So two things to consider here:
|
Another use for parameters: specifying which codegen backend to use: see (4) at #14041 |
Prior art which is similar to any sort of local But very relevant is that parameterization or "Toolchains" to define the bulk of the arguments to a target seem closer to what we want, possibly with a local-only "multiplex across all of these toolchains" syntax (á la parameterization) exclusive to the named/addressed toolchain. But toolchains are quite a bit like a " |
Here is a draft design for this feature: https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit ... use a pants-devel@ member email for comment access. Feedback welcome! |
…resolves` (#14299) Read the `help` message in `python/setup.py` for an explanation of how this works + what we mean by "disjoint". This wires up `experimental_resolve` to `python_source`, `python_awslambda`, and `python_google_cloud_function`. Thanks to the implementation in `pex_from_targets.py`, that means that they automatically support this new feature. The only target not yet supported is `python_distribution`. Key note: `python_source` has the field `resolve`, rather than what we originally envisioned with `compatible_resolves`. This is because we realized in #13882 and its design doc at https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit# that the way to support a `python_source` target working with multiple resolves is via parametrization, which will create a distinct target for each resolve. However, `python_requirement` still uses `compatible_resolves` (for now at least) because we don't directly run tools like MyPy and Pytest on those. More concretely, they are never the "root" used to determine which resolve to use for a build. [ci skip-rust]
To prepare to make the address syntax more complicated as part of #13882, move to parsing addresses using a generated `peg` parser (using the same parser crate that @jsirois introduced in #11922). Additionally, simplify `Address.spec` and `Address.path_safe_spec` slightly by removing non-trivial early returns and reducing the total number of f-strings. [ci skip-build-wheels]
Adds parameters to addresses in order to support the parameterized targets that are coming in #13882. Although this change could technically be consumed now, until it is used in `Target` construction (coming in the next few days), it is `[internal]` only. [ci skip-build-wheels]
#13882 will require adjusting how many instances of a target are created based on how many times the `parametrize` keyword appears in its fields (including in the `overrides` field, in the case of target generators). The `parametrize` keyword is not itself a valid field value (and would also be challenging to render as such in `peek`), and so target generation must be updated to consume something other than the instantiated target generator to decide which fields to generate with. To prepare for these changes, the base ("`template`") and overridden fields to use in target generation are added to the `GenerateTargetsRequest` class, and made required in `generate_file_level_targets`. To reduce the total amount of churn in callers of `generate_file_level_targets`, this change introduces the `TargetGenerator` and `TargetFilesGenerator` base classes for `Target`. The latter allows file-target generation to be requested declaratively, and both allow specifying which fields should be "moved" and "copied" from the target generator to the generated targets. These moved/copied declarations are used to construct the `template` for generated targets. [ci skip-rust] [ci skip-build-wheels]
As designed in #13882 (and elaborated on in #14420 in the context of Python), the `parametrize` builtin means that targets which want to support multiple resolves (or Scala versions) should use `resolve=parametrize(..)`, so that a different target is created per resolve. This change replaces the `compatible_resolves` field with `resolve` for the JVM. And in order to allow the `resolve` field to be `parametrize`d, it also moves it from the `core_fields` of JVM target generators (which cannot be `parametrize`d) to the `moved_fields`.
…copies of a target (#14408) As introduced in #13882 and designed in https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit#, this change adds the `parametrize` builtin. Examples: ```python # Create two copies of each generated target, with addresses qualified with `@resolve=a` and `@resolve=b`. python_sources( .., resolve=parametrize(“a”, “b”), ) # ...alternatively, define an alias for some or all of the values: python_sources( .., resolve=parametrize(“a”, b="bread and butter"), ) # Create four copies of each generated target, with addresses qualified with both parametrized fields. scala_sources( .., resolve=parametrize(“a”, “b”), jdk=parametrize("c", "d"), ) # Apply parametrization to only some of the generated targets: scala_sources( .., overrides={ "JdkDependent.scala": {"jdk": parametrize("c", "d")}, }, ) ``` ---- `parametrize` may be used in any field (other than `name`) of non-generator targets, but is limited to `moved` fields of generator targets (i.e., those which will be moved from the generator to the generated target). Using `parametrize` in the `overrides` of a target generator requires adjustments to the target generator's implementation (see #14430): this change only adjusts the implementation for target file generators (`TargetFileGenerator`).
Some preliminary docs were added here: https://www.pantsbuild.org/v2.11/docs/targets#parametrizing-targets ... edits very welcome! I'm out for a few days, but when I get back I'll add a list of open items here from #14408 that block closing this ticket. |
Can't help but notice that my original idea would actually be valid now (cool 😁 )
|
A few issues encountered in pantsbuild/example-jvm#12, which probably block resolving this:
Followup work would be:
|
Have updated the blockers in the above comment. We'll want to close this issue for |
During design for #13882, we agreed that parameters should come after the generated name for targets, but the initial version landed with the opposite order. This change swaps them to the agreed upon order. [ci skip-build-wheels]
#14521 is the last blocker for Thanks to everyone for the feedback and design assistance! |
Status quo
It's useful to run against the same target but with slightly different config. For example, run the same
python_test
with Python 2 and Python 3 to validate it works with both. This feature is a major benefit of Tox, for example - we know it has value.In particular, it often makes sense to have permutations here:
interpreter_constraints
forpython_test
targets to check that it's valid for all interpreterscompatible_resolves
forjava_source
to check that each resolve compiles for the targetshell
forshunit2_test
to check that the script works with multiple Shells (Zsh, Bash, etc)To do that, you currently have to define two targets:
Then, you can tell Pants to either run
:tests_py2
,:tests_py3
, or use globs like a file spec to run over both.A major downside of this approach is duplication. Another major downside is it results in ambiguity with dependency inference because two targets export the same file.
First-class parametrization
Instead, @benjyw @stuhood and I discussed a syntax like this:
Which would result in running the same tests one time per
shell
. Or for interpreter constraints, one test run per "named" interpreter constraint (#12652).This would require reworking the Target API and Rules API interaction, which is currently very premised on one target -> one "build". For example,
FieldSet
is uniquely identified by its target'sAddress
and our genericcore/goals/test.py
is not capable of creating twoPytestFieldSet
s for the sameTarget
."Configurations"?
Each target type may have multiple parametrizable fields, e.g. for
java_test
:compatible_resolves
,scala_versions
,java_versions
. One idea is to have users create a single config that sets all of these values into a single addressable(?) entity. Then the target simply points to which config to use.(I'm not sure what the benefit of doing this would be?)
The text was updated successfully, but these errors were encountered: