Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfiles: what resolve to use with MyPy and Pylint operating on python_library targets? #12714

Closed
Eric-Arellano opened this issue Aug 31, 2021 · 9 comments

Comments

@Eric-Arellano
Copy link
Contributor

Thus far, multiple user lockfiles has been following the approach of https://docs.google.com/document/d/1sr1O7W9517w8YbdzfE56xvDb72LafK94Fwxm0juIRFc/edit: only "root" targets like python_tests and pex_binary have a resolve field to choose which resolve/lockfile to use. Dependencies are irrelevant to choosing a resolve, e.g. if a python_tests target depends on another python_tests target (weird to do, but legal), we simply go with the resolve of the target currently being built with ./pants test.

This scheme works great for ./pants package / pex_binary and ./pants test / python_tests. But it's foiled by MyPy and Pylint, which both require resolving 3rdparty dependencies and can treat python_library targets as the root, e.g. ./pants typecheck src/python/pants/util/strutil.py.

Naive, unaccpetable solution: python_library has resolve field

It would be unambiguous what resolve to use. However, we'd need to validate that the entire closure is using a consistent resolve, meaning all dependencies and dependees. (It's only possible to use a single resolve in a build)

This means that a python_library can only ever work with a single resolve, which is extremely limiting. For example, if strutil.py is set to default, then anything that uses strutil.py transitively must use the same resolve. Especially given that we will require pex_binary targets w/ platforms set to use a platform-aware lockfile (#12612), that is a non-starter.

Possible solution? Use dependees roots

Find all root targets that depend on the python_library and use their resolve.

What to do if no roots

Use the default resolve.

What to do if >1 resolves

If >1 roots, and those use different resolves, then we "should" be able to use any of those resolves. All those resolves, by definition, should be compatible with the library and its deps. If not, our invalidation check will warn/error.

We probably want to avoid choosing platform-aware resolves (#12612) when possible, as those might only work with platforms different than the host.

Performance concern

Computing dependees can be slow, as it requires scanning the whole repository. I don't know of a way to improve this.

A simple hack is that if the # of resolves (from [python-setup]) is <= 1, then we can short circuit.

@jsirois
Copy link
Contributor

jsirois commented Aug 31, 2021

Naive, unaccpetable solution: python_library has resolve field

For completeness, clearly you could just pluralize the field and then check any given resolve has all file dependencies whose resolves contain it. This reflects libraries in-practice. They generally need to work with many resolves when you ship to PyPI. Pants is a good anti-example since it ships an API alongside a binary and really only works with 1 resolve since its all pinned out on PyPI. Plugin authors are then forced into that 1 resolve FAPP.

@Eric-Arellano
Copy link
Contributor Author

Oh, interesting! Something like a compatible_resolves field for python_library, which is a list. But pex_binary and python_tests still choose a single resolve via resolve field.

Thanks for that metaphor of libraries vs. binaries in general, that makes sense.

I want to think more about that, but this seems like a solid improvement from the "find dependees" proposal. Thank you!

@jsirois
Copy link
Contributor

jsirois commented Aug 31, 2021

To be clear, the metaphor is not mine. This is standard parlance in all lockfile communities I've encountered and the language is uniformly, lockfiles are for binaries and never libraries. Here resolves ~= locks and the same considerations are in play.

@stuhood
Copy link
Member

stuhood commented Aug 31, 2021

As discussed in the thread on this topic, there might be an alternative to the dependees case (or perhaps just an optimization?): if we have a list of known resolves for a repository, we might be able to skip the dependees search by filtering the resolves to see which are relevant. In effect: the lockfiles for resolves might contain lists of their member 3rdparty targets, and act like a cache for resolve lookups.

@Eric-Arellano
Copy link
Contributor Author

In effect: the lockfiles for resolves might contain lists of their member 3rdparty targets, and act like a cache for resolve lookups.

Ah, yeah, that type of check is possible via #12610. Interesting.

It does require a lot of magic fwict, though, such as a user figuring out how to force a particular resolve to be used. For example, if the target doesn't have any roots and you don't want it to use default, you'd have to introduce a root purely for the sake of changing the resolve. That's not very intuitive.

--

So far, John's proposal of compatible_resolves makes a lot of sense to me as being conceptually simple, and thus easy to reason about in what can be a very confusing topic. It does risk more boilerplate, but that boilerplate may be justified for the clarity it brings.

(We can also reduce the boilerplate with the bigger project of re-envisioning targets, so it's easier to say e.g. "this whole source tree uses these resolves")

@jsirois
Copy link
Contributor

jsirois commented Aug 31, 2021

Presumably you reduce boilerplate by having a default_compatible_resolves option for defining up in pants.toml. By default say that's ALL. Then you only need to ad metadata to library targets that are not compatible with all resolves. Presumably, that cuts down on boilerplate significantly in many cases?

@chrisjrn
Copy link
Contributor

I definitely like the idea of specifying multiple compatible resolutions here.

I'm trying to wrap my head around how we deal with a library being used as a dependency for another target here. If the library has a requirement A>=3,<=5, and the dependent target has requirement A==4, how do we deal with the case where our supplied resolutions for the library do not match the dependent target? Is this a case where we just fail and have to generate another resolution and add the that to the list of compatible resolutions for the library?

@Eric-Arellano
Copy link
Contributor Author

If the library has a requirement A>=3,<=5, and the dependent target has requirement A==4, how do we deal with the case where our supplied resolutions for the library do not match the dependent target? Is this a case where we just fail and have to generate another resolution and add the that to the list of compatible resolutions for the library?

Requirements are modeled as python_requirement_library targets. The lockfile would need to have been generated with the target for both A>=3,<=5 and the target for A==4. During lockfile generation, that will fail with "double requirement", unless they used env markers like ; python_version > '3' (think back to your PR that supports this in requirements.txt files).

That is, the lockfile would either need to include both targets; or if that's not possible with the double requirement, use two distinct resolves.

@Eric-Arellano
Copy link
Contributor Author

I think compatible_resolves is compelling enough to implement. Its conceptual simplicity is the key highlight for me. After a proof of concept, we can confirm it does the job effectively enough.

Thanks again for the feedback everyone!

Eric-Arellano added a commit that referenced this issue Sep 7, 2021
Part of #11165 and builds off of #12703.

Rather than having a single option `[python-setup].experimental_lockfile`, users set `[python-setup].experimental_resolves_to_lockfiles` to define 0-n "named resolves" that associate a lockfile with a name:

```toml
[python-setup]
experimental_resolves_to_lockfiles = { lock1 = "lock1.txt", lock2 = "lock2.txt" }
```

Then, individual `pex_binary` targets can specify which resolve to use:

```python
pex_binary(name="reversion", entry_point="reversion.py", experimental_resolve="lock1")
```

In a followup, we'll add a mechanism to set the default resolve.

Users can generate that lockfile with `./pants generate-lockfiles` (all resolves) or `./pants generate-lockfiles --resolve=<name>`:

```
❯ ./pants generate-lockfiles --resolve=lock1 --resolve=lock2
15:55:56.60 [INFO] Completed: Generate lockfile for lock1
15:55:56.61 [INFO] Completed: Generate lockfile for lock2
15:55:57.02 [INFO] Wrote lockfile for the resolve `lock1` to lock1.txt
15:55:57.02 [INFO] Wrote lockfile for the resolve `lock2` to lock2.txt
```

Then, it will be consumed with `./pants package` and `./pants run`. Pants will extract the proper subset from that lockfile, meaning that the lockfile can safely be a superset of what is used for the particular build.

```
❯ ./pants package build-support/bin:
...
15:56:33.87 [INFO] Completed: Installing lock1.txt for the resolve `lock1`
15:56:34.39 [INFO] Completed: Installing lock2.txt for the resolve `lock2`
15:56:34.48 [INFO] Completed: Extracting 1 requirement to build build-support.bin/generate_user_list.pex from lock1_lockfile.pex: pystache==0.5.4
...
```

If the lockfile is incompatible, we will (soon) warn or error with instructions to either use a new resolve or regenerate the lockfile.

In followups, this field will be hooked up to other targets like `python_awslambda` and `python_tests`. 

We will likely also add a new field `compatible_resolves` to `python_library`, per #12714, which is a list of resolves. "Root targets" like `python_tests` and `pex_binary` will validate that all their dependencies are compatible. When you operate directly on a `python_library` target, like running MyPy on it, we will choose any of the possible resolves. You will be able to set your own default for this field.
 
[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Sep 8, 2021
As explained in #12734, this allows you to set the resolve when running a specific test target.

The resolve will not yet be consumed when running Pylint and MyPy on the `python_tests` code, which is blocked by implementing #12714.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Dec 15, 2021
… instead of `compatible_resolves` (#13870)

Test targets are "roots" - we do not expect them to be depended upon by anything. Thus, they should choose which specific resolve gets used for their test run.

See #12714 for more on `compatible_resolves` vs `resolve`.

Closes #13369.

[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants