Use "named interpreter constraints" #12652

Eric-Arellano · 2021-08-25T20:23:15Z

Right now, anyone can add an arbitrary interpreter constraint to any Python target, like interpreter_constraints=["==2.7.*"]. As explained in https://blog.pantsbuild.org/python-3-migrations/, it's powerful that it's decentralized to set interpreter constraints. But it has downsides:

It's too easy for a user to unintentionally use un-blessed interpreter constraints. For example, an org might be trying to wait to use Python 3.10, but a developer accidentally starts using it without the build team knowing.
Performance. We have to scan the entire repository to determine all possible interpreter constraints. There are some performance improvements we can make if we know the repository only ever uses a single interpreter constraint, like no longer partitioning MyPy/Flake8/Bandit/Pylint by interpreter constraints. If we can simply see if the list of globally blessed constraints only has one element, we know we can short circuit that.

This is an improvement over #12495.

Modeling

There will be an option like this:

[python-setup]
interpreter_constraints = [
   "default: ['>=3.6,<3.10']",
   "py2: ['==2.7.*']",
]

Then targets will set:

python_library(interpreter_constraints="py2")

We also probably still want to support using only a single global interpreter constraint like this, in which case you would not be able to set the interpreter_constraints field:

[python-setup]
interpreter_constraints = ['>=3.6,<3.10']

It's not clear the best way to model the [python-setup] option. We want a mapping of names to constraints. In TOML, that would suggest using a nested table, but that does not work well with our options system which requires using list or dict. dict types are awkward in Pants and it would mean we can't keep allowing the simple case of setting a single global interpreter constraint:

[python-setup]
interpreter_constraints = "{'default': ['>=3.6,<3.10'], 'py2': ['==2.7.*']}"

I think a list[str] makes sense, where it can either be a single element w/ just the global constraints, or a list of key: value elements to create a mapping of blessed constraints. If you are doing the blessed constraints approach, we should probably special case the "default" key to be the default. (Is it an error to not set a default?)

How to handle deprecation

This is tricky because interpreter_constraints is already claimed and is a good name to use.

I propose we still keep the same option and field name, and teach our code to handle the old semantics and new one. For example, if the field interpreter_constraints is being set to a list[str] instead of str, assume they're using old semantics and give a deprecation. If a str, assume new semantics and try to find the named interpreter constraints.

The text was updated successfully, but these errors were encountered:

benjyw · 2021-08-25T22:02:30Z

I like this!

Why isn't the option dict-valued?

Eric-Arellano · 2021-08-25T22:06:11Z

Why isn't the option dict-valued?

Because:

dict types are awkward in Pants and it would mean we can't keep allowing the simple case of setting a single global interpreter constraint:

[python-setup]
# Awkward string escaping
interpreter_constraints = "{'default': ['>=3.6,<3.10'], 'py2': ['==2.7.*']}"

[python-setup]
# Note how clunky it is to set a single default for the whole repo
interpreter_constraints = "{'default': ['>=3.6,<3.10']}"

It's not possible for this option to be sometimes a list (if you just have one global default) vs. sometimes a dict.

Wdyt?

benjyw · 2021-08-25T22:53:59Z

That doesn't seem particularly clunky to me. It's just a normal Python dict literal.

And avoiding dict-valued options because we haven't provided good support for them in pants.toml seems like a big smell...

benjyw · 2021-08-25T22:55:38Z

Probably we can start supporting toml nested tables, now that option subscopes are no longer a thing?

Eric-Arellano · 2021-08-26T00:08:24Z

Probably we can start supporting toml nested tables, now that option subscopes are no longer a thing?

That would be pretty sweet.

[python-setup]
interpreter_constraints = { default = [">3.6,<4"], py2 = ["==2.7.*"] }

aka

[python-setup.interpreter_constraints]
default = [">3.6,<4"]
py2 = ["==2.7.*"]

Much much more natural. And then if you want to use the CLI or env vars, you'd have to use the Python syntax. Same with the add/remove syntax.

Note that "named resolves" will be implemented this same way, so getting this right will be useful.

--

If we have to add back subscopes, we can redesign a more TOML-tolerant mechanism like using | as the separator. cc @stuhood , sg?

benjyw · 2021-08-26T01:10:01Z

I can take a look at the toml thing. Unless, Eric, you already know how to do it and it's easy.

Eric-Arellano · 2021-08-26T01:15:38Z

It's not super complex fwict and I wrote the original TOML parser, so I'd be happy to do it. But feel free if it sounds fun and you're looking for things to do! I wouldn't be able to get to it till next week with tool lockfiles being stable.

Before embarking on it though, are we confident we won't add back subscopes that use . as the delimiter? We are comfortable committing to some alternative like |?

benjyw · 2021-08-26T01:38:38Z

Subscopes were very complicated for both Pants developers and users, and I would be reluctant to add them back, especially as a general name-based mechanism. But yeah, if we do, we can use something other than . as the delimiter, since I guess that has special meaning in TOML.

Eric-Arellano · 2021-08-26T19:55:32Z

FYI a subscope still exists in 2.7: python-protobuf.mypy-plugin. We're so close to 2.8 that it may make sense to hold off on implementing this new TOML feature. Or, special case it.

benjyw · 2021-08-26T22:46:24Z

Yeah, good point. Let's wait until we can do that one deprecation.

Eric-Arellano · 2021-09-02T04:17:08Z

Native TOML dictionaries are now supported! Based on developments with named resolves, I'm now thinking this modeling for named interpreter constraints:

[python-setup]
possible_interpreter_constraints = { py2 = "==2.7.*", py3 = ">=3.6,<4" }
default_interpreter_constraints = "py3"

python_library(interpreter_constraints="py2")

(Or known_interpreter_constraints, valid_interpreter_constraints, etc)

See #12742 for the two competing proposals for how to model the default ICs. From there:

I think option 2 gives better modeling when an org wants to change their default to a new value. Rather than having to retcon default, they can simply point to the [interpreter constraints] they want...If you start with --default=py2, then you can update it to --default=py3. That's better than changing default to have a different value.

This is more boilerplate when you only have a single interpreter constraint in your repository than the status quo. That's a bummer, but I think it's minimal enough and worth it. Reminder that you only have to set these two options once usually, and usually done by a codebase admin.

stuhood · 2021-09-02T18:24:58Z

@Eric-Arellano : Two completely wildcard ideas which have have been floating around in my mind (and which I'm very uncertain about).

There is the possibility that you would want to configure resolves and ICs at once. Then this might be something like:

class PythonConfiguration:
    name: str
    lockfile: str
    interpreter_constraints: str

I wonder how naming resolves/ICs interplays with https://docs.google.com/document/d/1sr1O7W9517w8YbdzfE56xvDb72LafK94Fwxm0juIRFc/edit?disco=AAAAHXZOocI ... essentially, whether if you could define the resolve/ICs for an entire subdirectory by globbing it with a target and having additive Fields (i.e., what we have previously called "multiple owning targets"), then would you still want to name the resolve/ICs vs defining it somewhere that globbed in the matching targets.

Eric-Arellano · 2022-05-06T17:08:37Z

I remain very concerned about the performance hit of calculating interpreter constraints in a single-IC repo, and our users have pointed it out, prompting changes like #15141.

I'm not sure how I feel about named interpreter constraints still. My concern is you have to create this custom naming mechanism and have all your internal users know what names to use. Consider this:

python_sources(
    interpreter_constraints="py35-39",
)

python_tests(
   interpreter_constraints=parametrize("py35", "py36", "py37", "py38", "py39"
)

Is that confusing to users to discover those names?

The major benefit I see still is that admins have tight blessing over which ICs are valid in a repo. That's desirable.

--

Reminder that if we don't use named interpreter constraints, we can do something like #12495 to fix performance.

Thoughts @thejcannon @stuhood @benjyw ?

stuhood · 2022-05-06T17:33:05Z

For the purposes of resolving the performance issue, I feel very strongly that we should implement #15241. If we want to add the naming of interpreter contraints, then we should do it for a usability reason rather than a performance reason, because I'm confident that #15241 would resolve the performance concern.

Eric-Arellano · 2022-05-06T17:47:11Z

I'm confident that #15241 would resolve the performance concern.

We would still need to iterate over O(n) where n is # targets, vs. O(1) of accessing len(python.interpreter_constraints). So while I agree it's no longer as mission critical..named interpreter constraints do still have a performance benefit.

Eric-Arellano · 2022-05-11T17:57:00Z

@benjyw you spent a lot of time with interpreter constraints & parametrize when recently adopting Pants in a repo. Purely from UX perspective, what do you think about the constraints being named? (#12652 (comment) for example)

benjyw · 2022-05-11T19:00:16Z

Purely from a UX perspective, this would be excellent to have. Right now I'm writing macros to get around this.

Imagine a repo that builds a wheel for 5 different interpreter versions, and therefore also has to run tests on those versions, for example. Being able to DRY and then parametrize on the interpreter constraint name instead of the actual constraint string would be a great step.

Eric-Arellano · 2022-05-11T19:34:01Z

Awesome! So then there's sufficient motivation:

ergonomics Benjy talks about w/ DRY
tight-control from codebase admins on when new ICs are introduced
minor performance enhancements, can short-circuit

Now just to figure out how to deprecate...

stuhood · 2022-05-11T20:04:00Z

IMO, we should probably wait until we know what #13767 will look like. If the configuration idea pans out, then it is roughly equivalent to named ICs, but with a different UX and no deprecation cycle.

benjyw · 2022-05-11T21:31:25Z

Yeah, I don't know if I would single out interpreter constraints for special UX consideration over using named configurations in general

Eric-Arellano added the proposal label Aug 25, 2021

Eric-Arellano mentioned this issue Aug 25, 2021

Add [python-setup].disable_mixed_interpreter_constraints for repositories not using the interpreter_constraints field #12495

Closed

Eric-Arellano mentioned this issue Sep 2, 2021

Lockfiles: figure out how to model the default resolve for multiple user resolves #12742

Closed

benjyw removed the proposal label Sep 9, 2021

Eric-Arellano mentioned this issue Dec 14, 2021

Add generic support for parametrizing metadata like interpreter constraints and resolves #13882

Closed

stuhood mentioned this issue Apr 23, 2022

Require that a target's interpreter_constraints are a subset of their dependencies' #15241

Closed

Eric-Arellano mentioned this issue Jun 8, 2022

Add "escape hatch" to avoid long-running Determine if necessary to partition Pylint input #15784

Closed

cognifloyd mentioned this issue Jun 14, 2022

Export a matrix of resolve+python versions #15817

Open

cognifloyd added the backend: Python Python backend-related issues label Mar 13, 2023

chrisjrn added this to 2023 Priorities Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use "named interpreter constraints" #12652

Use "named interpreter constraints" #12652

Eric-Arellano commented Aug 25, 2021

benjyw commented Aug 25, 2021

Eric-Arellano commented Aug 25, 2021 •

edited

Loading

benjyw commented Aug 25, 2021

benjyw commented Aug 25, 2021

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021 •

edited

Loading

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021

Eric-Arellano commented Sep 2, 2021

stuhood commented Sep 2, 2021 •

edited

Loading

Eric-Arellano commented May 6, 2022

stuhood commented May 6, 2022 •

edited

Loading

Eric-Arellano commented May 6, 2022

Eric-Arellano commented May 11, 2022

benjyw commented May 11, 2022

Eric-Arellano commented May 11, 2022

stuhood commented May 11, 2022 •

edited

Loading

benjyw commented May 11, 2022

Use "named interpreter constraints" #12652

Use "named interpreter constraints" #12652

Comments

Eric-Arellano commented Aug 25, 2021

Modeling

How to handle deprecation

benjyw commented Aug 25, 2021

Eric-Arellano commented Aug 25, 2021 • edited Loading

benjyw commented Aug 25, 2021

benjyw commented Aug 25, 2021

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021 • edited Loading

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021

Eric-Arellano commented Aug 26, 2021

benjyw commented Aug 26, 2021

Eric-Arellano commented Sep 2, 2021

stuhood commented Sep 2, 2021 • edited Loading

Eric-Arellano commented May 6, 2022

stuhood commented May 6, 2022 • edited Loading

Eric-Arellano commented May 6, 2022

Eric-Arellano commented May 11, 2022

benjyw commented May 11, 2022

Eric-Arellano commented May 11, 2022

stuhood commented May 11, 2022 • edited Loading

benjyw commented May 11, 2022

Eric-Arellano commented Aug 25, 2021 •

edited

Loading

benjyw commented Aug 26, 2021 •

edited

Loading

stuhood commented Sep 2, 2021 •

edited

Loading

stuhood commented May 6, 2022 •

edited

Loading

stuhood commented May 11, 2022 •

edited

Loading