Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata API: Store signatures as dict #1424

Merged
merged 3 commits into from
Jun 11, 2021

Conversation

jku
Copy link
Member

@jku jku commented May 31, 2021

store signatures in a Dict of keyid to Signature. This ensures
signature uniqueness. Raise in from_dict() if input contains multiple
different signatures for a keyid.

This changes Metadata object API, and makes it slightly different from
the file format: this is justified by making the API safer to use and
easier to validate.

Signed-off-by: Jussi Kukkonen jkukkonen@vmware.com

Fixes #1422


  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@jku
Copy link
Member Author

jku commented May 31, 2021

The reasons this is a draft is that it's a prime example of the clash between trying to to keep the API "close to the file format" and making it safe and easy to use:

  • sigs are now stored in a dictionary signatures_by_keyid -- this seems like the right thing to do
  • I kept signatures attribute almost as it is to "stay close to file format" -- now there's two ways of accessing sigs (signatures_by_keyid: Dict[str, Signature] is the real variable and signatures: Sequence[Signature] is an alternative view to the same data, generated on request)
  • the annoying part is that if you don't pay attention it's possible to make mistakes like this:
    md.signatures.append(new_sig)
    this creates a new list and appends to the new list, it does not modify the actual data store

I think I'll manage to live with this design: At least static type check will complain about appending to signatures: signatures is annotated as Sequence which is not mutable).

The alternative I guess would be to just have signatures: OrderedDict[str, Signature] as the only attribute (we don't really care about the order here but we want it to be deterministic). This looks easier to understand and makes the implementation simpler.

@jku
Copy link
Member Author

jku commented May 31, 2021

Similar design is very likely to be a solution to #1426 (delegation roles should be unique WRT role name). So the decision on "do we want to preserve List-type access to attributes that have a matching json array in the file format" is a generic one.

I do wonder, does the List-access really help anyone or are we just making things more complex for no benefit...

@jku
Copy link
Member Author

jku commented Jun 2, 2021

Based on feedback I was bold and just changed the API keeping it simple: signatures is now a dict of keyid to Signature: this ensures we will always only have one signature per keyid (and makes signature lookups much nicer looking).

It does mean a slight difference between file format and the API but I believe positives outweigh that.

This will slightly conflict with #1423 but in the end they will fit nicely together: the resulting Key.verify() will be easier to read and safer.

@jku jku marked this pull request as ready for review June 2, 2021 07:43
@jku jku force-pushed the enforce-unique-sigs branch 2 times, most recently from 760d453 to 388d296 Compare June 3, 2021 08:29
Copy link
Contributor

@sechkova sechkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -89,7 +92,7 @@ def from_dict(cls, metadata: Dict[str, Any]) -> "Metadata":
raise ValueError(f'unrecognized metadata type "{_type}"')

# Make sure signatures are unique
signatures: Dict[str, Signature] = {}
signatures: "OrderedDict[str, Signature]" = OrderedDict()
for sig_dict in metadata.pop("signatures"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really related to my commit here but if we really want to be deterministic, I think we 'd have to handle the original json->dict case as well (I don't think anything is currently guaranteeing in python 3.6 that metadata["signatures"] will iterate in order). json does allow that with
json.loads(data, object_pairs_hook=collections.OrderedDict)

(of course in practice It Just Works because cpython does order dict items in 3.6)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's likely worth being explicit here and changing how we load the JSON in the deserialiser.

@joshuagl
Copy link
Member

joshuagl commented Jun 7, 2021

As indicated, this somewhat dilutes the mapping from the metadata API to the specification and on-disk format.

The container format in the specification and on-disk have signatures as a list of dictionaries, which is how we have implemented Metadata before this PR.
This PR changes signatures from a list of Signature objects to a dictionary of keyid -> Signature objects. This makes the metadata API a bit safer to use, as one can not accidentally add the same signature to a Metadata container multiple times.

I think the trade-off here makes sense, and think it's the same decision @trishankatdatadog came to for tuf-on-a-plane, but would appreciate hearing from others cc @mnm678 @trishankatdatadog

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm in favour of this change but would like to hear from others too.

Copy link
Contributor

@mnm678 mnm678 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this choice. Using a dictionary will make some of the mistakes with unique signatures a lot easier to avoid. This is something I'd eventually like to see in the specification as well (ie theupdateframework/specification#155).

Jussi Kukkonen added 2 commits June 9, 2021 11:21
store signatures in a Dict of keyid to Signature. This ensures
signature uniqueness. Raise in from_dict() if input contains multiple
different signatures for a keyid.

This changes Metadata object API, and makes it slightly different from
the file format: this is justified by making the API safer to use and
easier to validate.

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Dict ordering is part of regular Dict from Python 3.7: Use OrderedDict
for signatures to make sure signatures are serialized in a reproducible
order even on 3.6.

The added benefit is that reader will immediately understand that the
order has some significance.

The actual type annotations are a bit convoluted because:
* typing does not include OrderedDict before 3.7 so can't use that
* Annotating inner types does not work for collections.OrderedDict
  in older pythons (so have to use the "stringified annotations")

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
@jku
Copy link
Member Author

jku commented Jun 9, 2021

rebased on develop (the key lookup is now much more readable after the keyid change in #1417)

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase LGTM, thanks.

Clearing the OrderedDict makes it easier to see what happens and
avoids having to call OrderedDict() again.

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
@jku
Copy link
Member Author

jku commented Jun 11, 2021

Added a small tweak which makes code easier to read and avoids calling OrderedDict constructor twice. This is very minor and the appending / not appending is tested quite well so I'm not waiting for yet another review: will merge when tests are good.

@jku jku merged commit 2a5bfb9 into theupdateframework:develop Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata API: enforce signature uniqueness by keyid
4 participants