Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metadata API: add MetadataInfo and TargetFile classes #1223

Closed

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented Nov 25, 2020

This pr addresses #1139, but doesn't fix it yet.

I will add a separate pr which will handle Roles and Keys classes.

Description of the changes being introduced by the pull request:
It includes the following changes:

  • Add a missing root use case in a couple of our tests
  • Make length and hashes optional in Timestamp as it is in our specification
  • Add MetaFile class and use it in the Timestamp and Snapshot classes
  • Add TargetFile class and use it in the Targets class
  • Rename a few instances of the update method in our classes and use a more verbose name.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@MVrachev MVrachev force-pushed the complex-metadata-classes branch 2 times, most recently from 1f074f8 to 5745088 Compare November 25, 2020 16:58
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, @MVrachev and many thanks for taking a stab at this! Don't let the many comments discourage you, they are mostly minor nit picks that should be easy to fix. Let me know when this is ready for another round for review.

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
@MVrachev MVrachev force-pushed the complex-metadata-classes branch 5 times, most recently from e517a86 to fd340ad Compare December 1, 2020 14:55
@MVrachev
Copy link
Collaborator Author

MVrachev commented Dec 1, 2020

It took me longer than expected, but I updated my pr.
I had to rebase because I dropped the commit in which I renamed update methods.
Additionally, I

  1. added a new commit New API: Make sure we are not changing dict arg
  2. added a new commit Fix documentation indentation as proposed by Lukas.
  3. Addressed all comments made by Lukas.

@lukpueh
Copy link
Member

lukpueh commented Dec 2, 2020

@MVrachev and @jku have rightfully pointed out that in my hierarchic from_dict design, which recursively instantiates and assigns nested TUF metadata objects from their json/dict representation, the input data (i.e. the json/dict) is mutated while being parsed, which might be surprising to the caller.

Note that this mutation only occurs in classes that use inheritance, i.e. in Signed and its subclasses Targets/Snapshot/Timestamp. Here is a simplified/abstract demonstration of the issue:

class Parent:
    def __init__(self, some_parent_attr: SomeParentAttr):
        self.some_parent_attr = some_parent_attr

    @classmethod
    def from_dict(cls, data: Dict):
        # Write back into passed 'data' dict so that we can pass it as '**data'
        # to constructor, with parsed value for 'data["some_parent_attr"]',
        # alongside other 'data' values that were parsed in subclasses.
        # FIXME: Mutation of 'data' (passed by reference) might surprise caller
        data["some_parent_attr"] = SomeParentAttr.from_dict(data["some_parent_attr"])
        return cls(**data)

class Child(Parent):
    def __init__(self,
                 some_parent_attr: SomeParentAttr,
                 some_child_attr: SomeChildAttr):
        super().__init__(some_parent_attr)
        self.some_child_attr = some_child_attr

    @classmethod
    def from_dict(cls, data: Dict):
        # Parse and write back into passed 'data' dict so that we can hand it
        # on to the parent class for any additional parsing of 'data' values.
        # FIXME: Mutation of 'data' (passed by reference) might surprise caller
        data["some_child_attr"] = SomeChildAttr.from_dict([data["some_child_attr"]])
        return super().from_dict(data)

@MVrachev proposes to copy (probably rather deepcopy) the input data in the relevant from_dict methods. While this seems like a fair fix, it introduces unnecessary overhead. Here is an alternative solution:

class Parent:
    @classmethod
    def from_dict(cls, data: Dict):
        # Create empty object with default or parametrized constructor with
        # default arguments.
        obj = cls()

        # Parse 'data["some_parent_attr"]' and assign it to the empty object.
        # Any child attributes are parsed and assigned later in the child
        # class.
        obj.some_parent_attr = SomeParentAttr.from_dict(data["some_parent_attr"])
        return obj

class Child(Parent):
    @classmethod
    def from_dict(cls, data: Dict):
        # Call parent parse to get object with parent attributes already
        # assigned and ...
        obj = super().from_dict(data)

        # ... parse and assign any child attributes here.
        obj.some_child_attr = SomeChildAttr.from_dict([data["some_child_attr"]])

        return obj

@trishankatdatadog
Copy link
Member

trishankatdatadog commented Dec 2, 2020

Just my 0.02 BTC as an observer:

One thing I regretted adding in my original Metadata API was native JSON decoding/encoding support as class methods. While it works well, it does not not decouple our internal Metadata objects from the data transport format (which need not be JSON).

In my new design, I use an abstract parser which can be overriden to implement different parsers for different data transport formats that nevertheless return the same type of internal Metadata objects we expect.

My new design does not yet account for how to write those internal Metadata objects back into JSON, but I don't believe it would be too difficult while keeping separation of concerns.

Anyway, please carry on, don't let me derail you, I was just making an observation based on something I caused.

@lukpueh
Copy link
Member

lukpueh commented Dec 2, 2020

Good point, @trishankatdatadog. This would be a third, more procedural option, for our considerations in secure-systems-lab/securesystemslib#272 (comment). And it indeed completely decouples internal Metadata objects from external transport format.

What I liked about my proposal is that serialization/deserialization logic is organized in small chunks that are encapsulated in the related class.

But I'm really fine either way. Curious to hear what others think.

@MVrachev
Copy link
Collaborator Author

MVrachev commented Dec 7, 2020

@MVrachev and @jku have rightfully pointed out that in my hierarchic from_dict design, which recursively instantiates and assigns nested TUF metadata objects from their json/dict representation, the input data (i.e. the json/dict) is mutated while being parsed, which might be surprising to the caller.

Note that this mutation only occurs in classes that use inheritance, i.e. in Signed and its subclasses Targets/Snapshot/Timestamp. Here is a simplified/abstract demonstration of the issue:

class Parent:
    def __init__(self, some_parent_attr: SomeParentAttr):
        self.some_parent_attr = some_parent_attr

    @classmethod
    def from_dict(cls, data: Dict):
        # Write back into passed 'data' dict so that we can pass it as '**data'
        # to constructor, with parsed value for 'data["some_parent_attr"]',
        # alongside other 'data' values that were parsed in subclasses.
        # FIXME: Mutation of 'data' (passed by reference) might surprise caller
        data["some_parent_attr"] = SomeParentAttr.from_dict(data["some_parent_attr"])
        return cls(**data)

class Child(Parent):
    def __init__(self,
                 some_parent_attr: SomeParentAttr,
                 some_child_attr: SomeChildAttr):
        super().__init__(some_parent_attr)
        self.some_child_attr = some_child_attr

    @classmethod
    def from_dict(cls, data: Dict):
        # Parse and write back into passed 'data' dict so that we can hand it
        # on to the parent class for any additional parsing of 'data' values.
        # FIXME: Mutation of 'data' (passed by reference) might surprise caller
        data["some_child_attr"] = SomeChildAttr.from_dict([data["some_child_attr"]])
        return super().from_dict(data)

@MVrachev proposes to copy (probably rather deepcopy) the input data in the relevant from_dict methods. While this seems like a fair fix, it introduces unnecessary overhead. Here is an alternative solution:

class Parent:
    @classmethod
    def from_dict(cls, data: Dict):
        # Create empty object with default or parametrized constructor with
        # default arguments.
        obj = cls()

        # Parse 'data["some_parent_attr"]' and assign it to the empty object.
        # Any child attributes are parsed and assigned later in the child
        # class.
        obj.some_parent_attr = SomeParentAttr.from_dict(data["some_parent_attr"])
        return obj

class Child(Parent):
    @classmethod
    def from_dict(cls, data: Dict):
        # Call parent parse to get object with parent attributes already
        # assigned and ...
        obj = super().from_dict(data)

        # ... parse and assign any child attributes here.
        obj.some_child_attr = SomeChildAttr.from_dict([data["some_child_attr"]])

        return obj

I agree with your proposal @lukpueh and will change it.
This is a more optimized way to fix that problem.

@MVrachev
Copy link
Collaborator Author

I implemented the suggestions from @lukpueh.
I had to give default function arguments for all class arguments assigned in __init__
for Root, Snapshot, Targets, and Timestamp classes.
I chose None as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already an existing issue for that:
#1140

It will be awesome if we can hear the opinions of others.
@joshuagl, @sechkova?

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @MVrachev! Speaking of updates, should we add TODO comments to mark the update methods for revision as discussed in #1223 (comment) pp? Otherwise this looks quite good to me. :)

tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Show resolved Hide resolved
@MVrachev MVrachev force-pushed the complex-metadata-classes branch 2 times, most recently from 4e26995 to 194a907 Compare December 10, 2020 15:43
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really neat, great work @MVrachev and thanks for the superb review @lukpueh.

tuf/api/metadata.py Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Outdated Show resolved Hide resolved
tuf/api/metadata.py Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

I addressed all comments made by @joshuagl and rebased to squash some of the commits and prepare this pr for merging.

@joshuagl
Copy link
Member

joshuagl commented Jan 5, 2021

Closing to try and re-trigger GHA.

@joshuagl joshuagl closed this Jan 5, 2021
@joshuagl joshuagl reopened this Jan 5, 2021
@joshuagl
Copy link
Member

joshuagl commented Jan 5, 2021

There are linter failures on the new changes, could you please take a look @MVrachev? Note: it's likely that we need to update pylint configuration for the new coding style.

@MVrachev
Copy link
Collaborator Author

MVrachev commented Jan 5, 2021

I added one commit where I disable inline some of the irrelevant warnings and for warning E1101 I decided to just disable it because it was creating a lot of false-positives.

# cls is most likely a child class of "Signed" and we need to setup
# the appropriate fields.
obj = cls() # pylint: disable=E1120
obj._type = signed_dict['_type'] # pylint: disable=W0212
Copy link
Member

@joshuagl joshuagl Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W0212 is protected-access, we're accessing a protected member outside of the class or a descendent. This makes sense given the pattern of creating an empty object, updating the fields, then returning the populated objet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU _type is not actually meant to be "protected" in the Python OOP sense (the naming dates back before our transition to more OOP), but rather is a workaround to not shadow the builtin type name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, let's add a comment to that effect?

# Warnings about rules E1120 and W0212 are not relevant here because
# cls is most likely a child class of "Signed" and we need to setup
# the appropriate fields.
obj = cls() # pylint: disable=E1120
Copy link
Member

@joshuagl joshuagl Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E1120 is no-value-for-parameter, which is that we're creating an instance of the class without passing values for __init__'s arguments.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this PR, given the number of linter warnings we're having to disable related to our from_dict pattern, I wonder whether we should document this (and consider other options?) with an ADR. cc @lukpueh

tuf/api/pylintrc Outdated
@@ -1,5 +1,5 @@
[MESSAGE_CONTROL]
disable=fixme
disable=fixme,E1101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E1101 is no-member, a variable has been accessed which the type checker doesn't think is a valid member of that type. That makes sense, we're upcasting to the base Signed type, then accessing members of the subtype.

Rather than disabling this globally, it would probably make more sense to disable it for only the tuf.api.metadata module (by including the #pylint: disable= comment at the top of the module's file).

@lukpueh
Copy link
Member

lukpueh commented Jan 11, 2021

Orthogonal to this PR, given the number of linter warnings we're having to disable related to our from_dict pattern, I wonder whether we should document this (and consider other options?) with an ADR. cc @lukpueh

Agreed. I'll put something together.

@MVrachev
Copy link
Collaborator Author

Updated the pr.
Added a more detailed comment about E1120 and W0212 and disabled E1101 only for tuf/api/metadata.py.
Also, added a commit with a comment for _type.

Had to rebase in order to return pylintrc back to its original state.

@MVrachev MVrachev changed the title New metadata API: add MetaFile and TargetFile classes New metadata API: add MetadataInfo and TargetFile classes Jan 20, 2021
Add a use case for the root class to be tested in test_generic_read
and test_read_write_read_compare tests in test_apy.py

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
From the reST/sphinx docs:
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#literal-blocks

I added new lines and an identation where it was missed.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Add MetaMetaFileFile class to tuf.api.metadata module.
This class is be used for the "meta" field in Timestamp and
Snapshot metadata files described the specification.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Add a new TargetFile class to tuf.api.metadata module make Targets
class to use it.
This class will contain information about the "targets" field
from targets.json

Also, update the tests for that change.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
In the Signed.from_dict function we are passing signed - a dictionary.
Dictionaries are passed by reference and thus we want to make sure
we are not changing the dictionary passed as a function argument.

I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Warnings E1120 and W0212 are irrelevant in one specific case, but
rule E1101 creates a lot of false-positives in tuf/metadata/api.

For example, there are 4 false-positives in this file related
to "meta" and "targets" fields not existing in "signed_dict" in
"from_dict" functions in Timestamp, Snapshot and Targets classes.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev
Copy link
Collaborator Author

I updated this pr and had to rebase in order to replace the ** key-value syntax I was using when creating class instances with
directly passing the arguments needed for the __init__ functions.

@joshuagl joshuagl requested a review from lukpueh January 25, 2021 15:37
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continued attention on this @MVrachev. I left a few comments, only one of which feels like a blocker – you removed a comment about converting expires so that it can be passed to the constructor as **signed_dict, but you've changed the code away from using that pattern. Should the conversion of expires to a datetime object be removed in favour of it being handled elsewhere?

Nice catch on the sphinx doc strings, can we add some kind of CI/linter check for these in future to prevent them being merged in an incorrect state? This feels issue worthy, rather than something we need to resolve here.

@lukpueh I'd appreciate your review on this, I don't feel quite up to speed on the new metadata API yet.

tuf/api/metadata.py Show resolved Hide resolved
Comment on lines +452 to +453
hashes: A optional dictionary containing hash algorithms and the
hashes resulting from applying them over the metadata file.::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if something like the following might be clearer? Or am I nitpicking?

Suggested change
hashes: A optional dictionary containing hash algorithms and the
hashes resulting from applying them over the metadata file.::
hashes: An optional dictionary mapping hash algorithms to the
hashes resulting from applying them over the metadata file
contents.::



def __eq__(self, other: 'MetadataInfo') -> bool:
"""Compare objects by their values instead of by their addresses."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: by default Python compares objects by their identity, which is like the address but may not be. Can we use the Pythonic term identity here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, didn't know about that.

Comment on lines +624 to +625
hashes: A dictionary containing hash algorithms and the
hashes resulted from applying them over the target file::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above suggestion, I'd welcome feedback on whether this is clearer or not?

Suggested change
hashes: A dictionary containing hash algorithms and the
hashes resulted from applying them over the target file::
hashes: A dictionary mapping hash algorithms to the
hashes resulting from applying them over the metadata file
contents::



def __eq__(self, other: 'TargetInfo') -> bool:
"""Compare objects by their values instead of by their addresses."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about identity here.

@@ -289,22 +291,27 @@ def __init__(
def from_dict(cls, signed_dict: JsonDict) -> 'Signed':
"""Creates Signed object from its JSON/dict representation. """

# Create empty object with default or parametrized constructor with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Create empty object with default or parametrized constructor with
# Create empty object with default or parameterized constructor with

signed_dict['expires'])
# NOTE: We write the converted 'expires' back into 'signed_dict' above
# so that we can pass it to the constructor as '**signed_dict' below,
# along with other fields that belong to Signed subclasses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the removal of this comment indicate we should no longer be doing the conversion of the expires string to a datetime object here?

@MVrachev
Copy link
Collaborator Author

MVrachev commented Feb 22, 2021

On hold, until we merge #1270 and #1279.

@joshuagl joshuagl marked this pull request as draft March 17, 2021 11:46
@joshuagl
Copy link
Member

I've converted this PR to draft until it is ready for review. Mostly to save myself from checking the discussion each time to see what the status is.

@MVrachev
Copy link
Collaborator Author

I feel like this pr has a lot of discussions and changes that are outdated compared to our current code.

I made a new branch migrating the changes from this pr and found out that: c37bdd2 and the discussion we had with @lukpueh starting from here are no longer relevant to our code, commits
c37bdd2 and b9c70aa are different and
I dropped commit 383e260.

I feel it makes more sense to close this pr and open a new one.

@MVrachev MVrachev closed this Mar 31, 2021
@MVrachev MVrachev deleted the complex-metadata-classes branch March 31, 2021 17:59
@MVrachev
Copy link
Collaborator Author

Replaced by #1329.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants