Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Eric-Arellano · 2021-08-19T22:47:29Z

Currently, we expect that the consumer of a lockfile has identical input requirements to what was used to generate the lockfile. We do this by hashing the input requirements and saving that hash in the lockfile.

This works great for tool lockfiles, where the context's requirements should always be the same as what the tool lockfile was generated with. But it does not work well with user requirements, where often the context is a subset of a bigger universe of requirements. For example, a test only uses requests, which is 1 of 20 requirements in the lockfile. We should not error, so long as the version of requests is compatible with what was used to generate the lockfile.

We sort of do this right now with constraints files:

pants/src/python/pants/backend/python/util_rules/pex_from_targets.py

Lines 246 to 251 in c2f2eb0

    
           unconstrained_projects = name_req_projects - constraint_file_projects 
        
           if unconstrained_projects: 
        
               logger.warning( 
        
                   f"The constraints file {python_setup.requirement_constraints} does not contain " 
        
                   f"entries for the following requirements: {', '.join(unconstrained_projects)}" 
        
               )

But this compatibility check is not robust enough. It's only checking that the project_name is contained in the lockfile, whereas we need to validate that the entire requirement string is compatible. In a world of multiple user lockfiles, it will be valid to have one lockfile with Django==2 and another lockfile with Django==3. We need to make sure the whole requirement is compatible.

--

To implement, we should probably start preserving the original input requirement strings in the Pants metadata header at lockfile generation time. We can then check that the context's set of Requirement objects is a set.subset() of the lockfile's Requirement objects.

The text was updated successfully, but these errors were encountered:

…2611) Prework for #12610. Using a class gives better namespacing and easier access to the `LockfileMetadata` fields via `self`. [ci skip-rust] [ci skip-build-wheels]

Eric-Arellano · 2021-08-20T17:30:26Z

Counterpoint to this idea: sometimes removing a dependency does change the behavior of the build. For example, Flake8 plugins. When a user removes a Flake8 plugin from [flake8].extra_requriements, their lockfile will need to be regenerated for it to actually be removed and for Flake8 behavior to change. There, we do not want to check that the context's lockfile is compatible: we want an exact match.

For user requirements, I do think we need to use the less precise "is compatible" check. Even though it has the same risk of not handling removing dependencies, it's necessary for performance so that we don't have to compute the entire resolve's requirement strings. And there's a workaround that users can manually regenerate the lockfile, we only won't automate telling them to.

So I think the takeaway is: for tool lockfiles, stick to exact matching. For user requirements, switch to compatibility checking.

jsirois · 2021-08-21T11:03:16Z

Fwiw, if tool resolves also subsetted their locks, everything would work the same and be correct in the dep remove case.

Eric-Arellano · 2021-08-21T15:02:24Z

Ah, true. Thanks John!

For now, we should expect that the lockfile metadata is well-formed and present. This allows us to simplify some of the code. Prework for #12610. [ci skip-rust] [ci skip-build-wheels]

This ensures that callers can't mess up the determinism of the lockfile's invalidation digest. It will also make #12610 more presentable for users, when we store the requirement strings in the lockfile rather than just a hash. [ci skip-rust] [ci skip-build-wheels]

chrisjrn · 2021-09-01T15:57:51Z

I am going to start approaching this work today. My current interpretation is that "compatible" requirements, for now, is that the context's requirement strings are a subset of the requirement strings specified in the lockfile.

In the future, we may be able to parse out version strings and do the requisite set math on requirements strings (which will also help simplify our interpreter constraints work), but right now that is out of scope.

I currently do not have plans to verify that the header is unmodified, but that seems like a reasonable fix too.

Eric-Arellano · 2021-09-01T16:09:31Z

Sounds good! I agree with exact matches, ideally impervious to white space differences etc

As discussed in DM, tool lockfiles should continue using a hash and exact matches.

chrisjrn · 2021-09-01T17:57:43Z

Here's what I intend to do then:

Add a new field to the metadata header called tags. tags is a dict[str,str].
tags is reasonably free-form, but for now will include keys purpose and version. purpose can be either tool or user, and version should be castable to int.
We retain our right to deprecate specific (purpose, version) pairs
Lockfiles without a tags field will be assumed to be version 0 purpose tool lockfiles, and will be deprecated at our earliest convinience per Lockfiles: confirm deprecation plan for lockfile header #12683

Eric-Arellano · 2021-09-01T18:36:57Z

What's the motivation for embedding tool vs user, along with version?

I suspect an even simpler modeling is something like this:

{
    "requirements": ["foo==1.2", "bar>=1"],
    "requirementes_hash": null,
    "interpreter_constraints": ["==3.6.*"],
    "platforms": null,
}

I know you also proposed some nesting:

{
    "requirements": ["foo==1.2", "bar>=1"],
    "requirementes_hash": null,
    "env": {
        "interpreter_constraints": ["==3.6.*"],
        "platforms": null,
     },
}

chrisjrn · 2021-09-01T18:41:25Z

The motivation for embedding tool vs user was to give us an expectation of what fields need to be present to validate the lockfile, which would be important if we add a verification field (to check that the rest of the data in the header is unmodified)

Eric-Arellano · 2021-09-01T18:48:34Z

was to give us an expectation of what fields need to be present to validate the lockfile

That expectation comes from our code itself. If it's a tool lockfile, we need requirementes_hash else requirements. Our code knows what the lockfile refers to, no need afaict for that to be embedded in the header.

Eric-Arellano · 2021-09-01T18:50:08Z

if we add a verification field

What do you mean? Like a schema number? Ad discussed in #12683, I think we possibly would benefit from that, but probably not necessary because of the deprecation plan to not support older schema than the prior release's schema. We can likely keep it simple and leave that off, rather than needing to add and test this new mechanism.

chrisjrn · 2021-09-01T18:51:17Z

I meant hashing the data in the header

Eric-Arellano · 2021-09-01T19:04:57Z

(to check that the rest of the data in the header is unmodified)

Hm, I'm not sure that is important to address right now. We already have a big warning to not change the metadata. If you do, that's at your own peril. And it's easy to fix by regenerating the lockfile.

I don't think we need an extra mechanism to validate they didn't tamper with the header.

chrisjrn · 2021-09-01T19:40:25Z

Seems reasonable

Eric-Arellano · 2021-09-07T16:55:51Z

Did you mean to close this @chrisjrn? This hasn't been merged to main.

chrisjrn · 2021-09-07T16:56:10Z

No, I must have accidentally clicked a button

…ally specified requirements (#12782) This factors out versioning capabilities into `LockfileMetadata` so that it's possible to easily change the set of validation requirements for a lockfile. V1 represents the original lockfile version (where constraints and an invalidation digest are set). V2 allows for the old behaviour, but also allows specifying the input requirements for a lockfile, and verifying that the user requirements are ~a non-strict subset of~ identical to the input requirements. We decided to replace the requirements hex digest with requirements strings to allow us to test whether the lockfile produces a _compatible_ environment rather than an _identical_ environment, which will be useful for user lockfile support when we eventually enable that. In the meantime, tool lockfiles still test for an identical environment, but the extra data in the lockfile will allow for more fine-grained error messages in a future version. The implementations of `_from_json_dict` and `is_valid_for` are a bit repetitive; I can factor out the common behaviour with a bit of work, but given we expect to delete the V1 implementation before too long. Currently this _does not_ add the `platforms` capability to the header, but now it's going to be easy enough to bump the version number if we want to add more fields. Closes #12610

Eric-Arellano mentioned this issue Aug 20, 2021

[internal] Refactor lockfile_metadata.py to be more class-based #12611

Merged

Eric-Arellano mentioned this issue Aug 20, 2021

[internal] Sort input requirements for LockfileMetadata #12615

Merged

Eric-Arellano mentioned this issue Aug 20, 2021

[internal] Expect lockfile metadata to be defined #12616

Merged

This was referenced Aug 27, 2021

Lockfiles: confirm deprecation plan for lockfile header #12683

Closed

Lockfiles: what resolve to use with MyPy and Pylint operating on python_library targets? #12714

Closed

chrisjrn self-assigned this Sep 1, 2021

chrisjrn closed this as completed Sep 7, 2021

chrisjrn reopened this Sep 7, 2021

chrisjrn mentioned this issue Sep 8, 2021

Add new version of LockfileMetadata to support checking for identically specified requirements #12782

Merged

Eric-Arellano closed this as completed in #12782 Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Eric-Arellano commented Aug 19, 2021

Eric-Arellano commented Aug 20, 2021

jsirois commented Aug 21, 2021

Eric-Arellano commented Aug 21, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021 •

edited

Loading

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 7, 2021

chrisjrn commented Sep 7, 2021

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Comments

Eric-Arellano commented Aug 19, 2021

Eric-Arellano commented Aug 20, 2021

jsirois commented Aug 21, 2021

Eric-Arellano commented Aug 21, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021 • edited Loading

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 1, 2021

chrisjrn commented Sep 1, 2021

Eric-Arellano commented Sep 7, 2021

chrisjrn commented Sep 7, 2021

Eric-Arellano commented Sep 1, 2021 •

edited

Loading