Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Client refactor #1291

Merged
merged 11 commits into from
Mar 24, 2021
Merged

Conversation

sechkova
Copy link
Contributor

@sechkova sechkova commented Feb 24, 2021

Description of the changes being introduced by the pull request:

This is an in-progress draft for the client code refactor.

Proposes a new Updater.refresh() implementation:
- based on metadata API
- no longer dependent on keydb/roledb
- follows the TUF specification's client workflow
- reusing the network abstraction work #1250

Introduces a MetadataWrapper class with the goal of providing functionality which is at this point missing in metadata API.
I imagine this code would find a proper place with the advancement of the refactor.

The code related to the actual target files download is mostly a transfer from the old code. Needs to be further
reworked.

A non-extensive list of next steps:

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@sechkova sechkova added the client Related to the client (updater) implementation label Feb 26, 2021
@sechkova sechkova changed the title [WIP]: Client refactor WIP: Client refactor Feb 26, 2021
@sechkova sechkova added this to the Client Refactor milestone Feb 26, 2021
@jku
Copy link
Member

jku commented Mar 8, 2021

Thanks, this is a great start. There is already a discussion on how to continue (longer living branch or PRs or what) on slack so I'll just leave some thoughts on where we should try to move next.

I think this the right time to try splitting the Updater functionality into meaningful components -- the Metadata API does a good bit of that already but I think Updater is still going to be too large and should be split if possible (and I think it is possible, also tuf-on-a-plane has some useful ideas). Listing some Updater functionality below for inspiration:

  • implement the TUF update process in a way that is easy to follow and verify
  • keep track of currently valid set of metadata, verify new metadata
  • read/write local metadata files
  • traverse target delegation tree
  • download metadata/target files, while handling mirrors

To me tracking the currently valid set of metadata is one potentially separate thing here: an object could hold the currently valid set of metadata (and verify new metadata and maybe handle reading/writing the local metadata cache) without knowing about all the steps that the TUF spec requires or about the API that Updater wants to provide. Maybe this leads to most code being in the metadata handler but this might be worth testing -- it's also something a bit like what @trishankatdatadog did in tuf-on-a-plane (Repository vs models,writers,readers).

On download.py: maybe download helper methods are still useful but I don't think you should keep using download.py as it is. What might make sense is a redesign of mirrors.py+download.py where a "MirrorDownloader" knows about the fetcher, knows about all the mirrors and gives the Updater an easy way to download metadata and targets without worrying about the details?

On MetadataWrapper: I think you should start aggressively filing issues for any individual functionality that you think should be in Metadata so we can discuss them one-by-one... unless there's a wider discussion that you think should be had about it? -- in any case I think a separate bug works in that case too.

On public API changes: I think no-one has talked about major changes to the client API so I think fine-tuning (if needed) can wait until we have the architecture looking reasonable? The only major change I can imagine is cleaning up the mirrors configuration somehow but even there I have no practical ideas.

Reading local metadata vs downloading metadata: currently these are intermingled in the update process, I'm not sure they should be? This also relates to the API as well: refresh() can be called multiple times (even though it's not something e.g. pip intends to do), it seems odd to reload the local files every time. On the other hand the delegated target metadata must anyway be loaded only when needed... so maybe my point isn't that well made.

Some random comments from reading the code:

  • refresh() is really the real init(): before that the object is not properly initialized -- you don't even know if your local root metadata exists before refresh(). Old Updater loads local metadata in init() already so it either is initialized or failed loudly
  • looping through a mirror iterator is a bit weird: we only want one valid result
  • _load _targets('targets', 'root') ends up being done twice
  • I'd like to get rid of comments like Check for an arbitrary software attack, instead we should describe what's checked and why it needs to be done
  • the contextmanager change looks like a good idea but is unrelated to the PR

@lukpueh
Copy link
Member

lukpueh commented Mar 9, 2021

Big thanks for pioneering this, @sechkova, and for your great high-level review, @jku! Let me add some quick thoughts to the latter:

  • implement the TUF update process in a way that is easy to follow and verify

💯 Let's once more stress the importance of the word reference in reference implementation. :)

  • keep track of currently valid set of metadata [...]

I agree with @jku that it would be good to implement this separately (i.e. in a re-usable way). We will need similar VCS-like functionality for a repository tool too.

Although, other than the repository tool, the client does not need to worry about differences between metadata in memory and on disk within a given version number. (related issues: #955, #958, #964)

  • read/write local metadata files

Note that simple read/write to file storage is implemented on the Metadata class see from/to_file + external storage and serialization facilities (#1279). I think the crucial part here is the book keeping (of versions) and decision making (when to persist metadata) as discussed above.

  • traverse target delegation tree
  • download metadata/target files, while handling mirrors

Will need to take a closer look at @sechkova's diff in order to comment.

On MetadataWrapper: ...

I agree that we should evaluate, whether there is an architectural need for a client-side MetadataWrapper, or if these things could be on the Metadata class.

--

So much for now. Will comment more after having taken a closer look.

@joshuagl joshuagl changed the base branch from develop to experimental-client March 9, 2021 14:25
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a closer look, and here are more comments. :) (Note: they are not related to the lines where I posted them, only to the modules).

tuf/download.py Outdated
@@ -195,42 +195,42 @@ def _download_file(url, required_length, fetcher, STRICT_REQUIRED_LENGTH=True):
average_download_speed = 0
number_of_bytes_received = 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall that there has been discussion about download.py, where it should live, why it shouldn't be in the client subdirectory, or even part of updater.py, etc... I can't remember details. Can someone else? Maybe from the #1250 team?

Also, I suggest we get rid of safe_download and unsafe_download. The names are unspecific and the latter sounds scarier than it is. AFAICS they only add an unnecessary level to the call stack.

Maybe we can even merge _download_file and _check_downloaded_length. The latter seems to be mostly docstring and logging statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall that there has been discussion about download.py, where it should live, why it shouldn't be in the client subdirectory, or even part of updater.py, etc... I can't remember details. Can someone else? Maybe from the #1250 team?

This is at least one of the places it has been mentioned: #1250 (comment)

Also, I suggest we get rid of safe_download and unsafe_download. The names are unspecific and the latter sounds scarier than it is. AFAICS they only add an unnecessary level to the call stack.

Maybe we can even merge _download_file and _check_downloaded_length. The latter seems to be mostly docstring and logging statements.

I think we are all on the same page about totally reworking download.py + mirrors.py.

# Copyright 2020, New York University and the TUF contributors
# SPDX-License-Identifier: MIT OR Apache-2.0

"""TUF client 1.0.0 draft
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, @sechkova! Is this as feature-complete as the existing updater.py? If so, nice job condensing it to 1/3 of LOC! :)

Some high-level comments:

  • _mirror_meta_download and _mirror_target_download look a lot a like. Do we need both?
  • Same goes for _get_full_meta_name and _get_relative_meta_name. Maybe the former could be a wrapper around the latter?
  • The 5 top-level functions at the bottom of the module feel a bit lost. Should we add a client.util module? Or make them part of the Updater class, e.g. as staticmethods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, @sechkova! Is this as feature-complete as the existing updater.py? If so, nice job condensing it to 1/3 of LOC! :)

I believe it implements the same features as the existing updater but most likely it misses to address corner cases. Anyway, in terms of volume I think this will be approximately the end result :)

Some high-level comments:

  • _mirror_meta_download and _mirror_target_download look a lot a like. Do we need both?

I consider them part of the "totally rework download and mirrors" plan and I hope to see them gone!

  • Same goes for _get_full_meta_name and _get_relative_meta_name. Maybe the former could be a wrapper around the latter?
  • The 5 top-level functions at the bottom of the module feel a bit lost. Should we add a client.util module? Or make them part of the Updater class, e.g. as staticmethods?

Yes, you are right, staticmethod would have helped describe them better. They are actually copy-paste transfer of the old code which I needed to make a full update cycle work. They need a better place (same as _get_*_name methods)

# SPDX-License-Identifier: MIT OR Apache-2.0

"""Metadata wrapper
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit paradox that MetadataWrapper is another container on top of Metadata, adding another level of nesting, when most of the methods/properties are actually shortcuts to attributes contained somewhere in Metadata. If we want a client-variant of Metadata, I'd maybe use inheritance instead of composition. 🤷

But maybe we don't even need it. Let's quickly go through the individual methods/properties of here:

On MetadataWrapper:

All the other methods here are either pure shortcuts or filters of attributes contained in Metadata objects (or contained objects).

@sechkova sechkova added the experimental-client Items related to the development of a new client (see milestone/8 and theexperimental-client branch) label Mar 10, 2021
@joshuagl
Copy link
Member

Some really nice work happening here, thank you all!

An a more procedural point, we agreed that we want to do this experimental client in a separate branch (experimental-client) and use a label ('experimental-client') for related issues and PRs. What we did not figure out is what the barrier is for landing this PR? Shall we land this series as is? Should we capture the discussion points as Issues first? Something else?

@lukpueh
Copy link
Member

lukpueh commented Mar 11, 2021

I'm not opposed to merge this PR as is. The only disadvantage I can see is that it will close this discussion board. So let's make sure to capture all relevant items in new or existing issues first. @sechkova, would you be willing to do that?

@sechkova
Copy link
Contributor Author

Agree, let me work on this a couple of days and I will notify you when I think it is ready to be merged.

@sechkova
Copy link
Contributor Author

Rebased on the latest changes to include the new metadata serialization.

@sechkova
Copy link
Contributor Author

I opened the following list of issues to capture the discussion, most of them are marked as experimental-client but there are a several requests for enhancements of the current implementation:

Let me know if you disagree or you find something missing. Otherwise I think we can merge this PR and continue working on the list.

@joshuagl
Copy link
Member

That's a healthy list of issues, thanks @sechkova.

I'd like CI to pass for PRs agains the experimental-client branch before we merge pull requests. I think it would be reasonable to add a temporary revertible commit which prevents coverage failing on the new client, either by ignoring it or by lowering coveralls' --fail-under.

With that change, we should be able to see genuine problems in PRs to the experimental-client branch, such as the fact that test_refresh is failing on Windows. Lets fix this before we land this PR:

2021-03-12T10:40:51.1038651Z ======================================================================
2021-03-12T10:40:51.1039652Z ERROR: test_refresh (test_updater_rework.TestUpdater)
2021-03-12T10:40:51.1040813Z ----------------------------------------------------------------------
2021-03-12T10:40:51.1042007Z Traceback (most recent call last):
2021-03-12T10:40:51.1044302Z   File "d:\a\tuf\tuf\tuf\api\serialization\json.py", line 34, in deserialize
2021-03-12T10:40:51.1045513Z     json_dict = json.loads(raw_data.decode("utf-8"))
2021-03-12T10:40:51.1046692Z   File "c:\hostedtoolcache\windows\python\3.6.8\x64\lib\json\__init__.py", line 354, in loads
2021-03-12T10:40:51.1047681Z     return _default_decoder.decode(s)
2021-03-12T10:40:51.1048677Z   File "c:\hostedtoolcache\windows\python\3.6.8\x64\lib\json\decoder.py", line 339, in decode
2021-03-12T10:40:51.1050103Z     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2021-03-12T10:40:51.1051756Z   File "c:\hostedtoolcache\windows\python\3.6.8\x64\lib\json\decoder.py", line 355, in raw_decode
2021-03-12T10:40:51.1052672Z     obj, end = self.scan_once(s, idx)
2021-03-12T10:40:51.1053669Z json.decoder.JSONDecodeError: Expecting ',' delimiter: line 22 column 26 (char 515)
2021-03-12T10:40:51.1054530Z 
2021-03-12T10:40:51.1055265Z The above exception was the direct cause of the following exception:
2021-03-12T10:40:51.1055873Z 
2021-03-12T10:40:51.1056514Z Traceback (most recent call last):
2021-03-12T10:40:51.1057376Z   File "D:\a\tuf\tuf\tests\test_updater_rework.py", line 154, in test_refresh
2021-03-12T10:40:51.1058707Z     self.repository_updater.refresh()
2021-03-12T10:40:51.1060220Z   File "d:\a\tuf\tuf\tuf\client_rework\updater_rework.py", line 90, in refresh
2021-03-12T10:40:51.1061393Z     self._load_snapshot()
2021-03-12T10:40:51.1062635Z   File "d:\a\tuf\tuf\tuf\client_rework\updater_rework.py", line 372, in _load_snapshot
2021-03-12T10:40:51.1063643Z     verified_snapshot = self._verify_snapshot(temp_obj)
2021-03-12T10:40:51.1064742Z   File "d:\a\tuf\tuf\tuf\client_rework\updater_rework.py", line 484, in _verify_snapshot
2021-03-12T10:40:51.1065871Z     intermediate_snapshot = SnapshotWrapper.from_json_object(temp_obj)
2021-03-12T10:40:51.1068732Z   File "d:\a\tuf\tuf\tuf\client_rework\metadata_wrapper.py", line 33, in from_json_object
2021-03-12T10:40:51.1070186Z     _meta = deserializer.deserialize(raw_data)
2021-03-12T10:40:51.1071338Z   File "d:\a\tuf\tuf\tuf\api\serialization\json.py", line 38, in deserialize
2021-03-12T10:40:51.1072315Z     raise DeserializationError from e
2021-03-12T10:40:51.1073459Z tuf.api.serialization.DeserializationError
2021-03-12T10:40:51.1074346Z 
2021-03-12T10:40:51.1075083Z ----------------------------------------------------------------------

@sechkova sechkova force-pushed the client-rework branch 3 times, most recently from e5c9502 to fa4065f Compare March 18, 2021 16:19
@sechkova
Copy link
Contributor Author

sechkova commented Mar 18, 2021

Temporary reduced the coverage tool failure boundary to 90%, as suggested, in order to uncover other test failures.

The Windows tests were failing due to DOS-style line ending conversion in the test metadata files which on its turn was leading to a file size not matching the one defined in the metadata. This is fixed by adding a .gitattributes file declaring that test/repository_data files' EOL should be the Unix syle LF, regardless of the platform.

Now with tests passing, probably I should do something with the linter too or at least temporary ignore the new files.

@lukpueh
Copy link
Member

lukpueh commented Mar 19, 2021

Now with tests passing, probably I should do something with the linter too or at least temporary ignore the new files.

Looks like ~95% of the warnings are "bad-indentation". I suggest to run black (see #1314) over client_rework/*, which should fix indentation and maybe even some of the other issues.

It might be worth to rebase on top of #1314 once it's merged and configure linting and auto-formatting for your new files akin to how we do for new files in api/*.

@joshuagl
Copy link
Member

Completely agree, let's wait for the merge of #1314 (which is blocked on #1261), then run black over this new code and update the linter configuration to keep us honest here.

@lukpueh
Copy link
Member

lukpueh commented Mar 19, 2021

FYI: #1314 (and #1261) have been merged.

@sechkova
Copy link
Contributor Author

sechkova commented Mar 23, 2021

Applied your suggestions and now black and isort are run over the client_rework/* files in the tox lint environment.

Fixed most of the linter issues but still had to temporary disable some checks that required less trivial changes. I suggest fixing them when working on the related issues (see last two commits).

@sechkova sechkova marked this pull request as ready for review March 23, 2021 14:48
@sechkova sechkova changed the base branch from experimental-client to develop March 23, 2021 15:34
@sechkova sechkova changed the base branch from develop to experimental-client March 23, 2021 15:34
@sechkova
Copy link
Contributor Author

Opened another issue to track "temporary" commits: #1320

@sechkova sechkova force-pushed the client-rework branch 2 times, most recently from 182ed12 to 0172753 Compare March 23, 2021 16:15
@sechkova
Copy link
Contributor Author

@jku asked to apply the .gitattributes commit to the main branch too so I opened another PR with it only: #1321
p.s. I also saw a typo and amended the commit ...

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything I could think of is now filed as an issue, will merge this (to experimental-client) today if there are no more objections.

A proposal of a new Updater.refresh() implementation:
    - based on metadata API
    - no longer dependent on keydb/roledb
    - follows the TUF specification's client workflow

Introduces a MetadataWrapper class with the goal of
providing functionality which is at this point missing
in metadata API.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Mostly a transfer of the current client code related to
the actual target files download. Needs to be further
reworked.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Adds a basic test case for Updater.
Applies the linter config used in api/metadata.py to all files
under client_rework.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Coverage failures may hide other failing tests in the CI.
Configure coverage to fail under 90 percent during the
ongoing experimental-client development.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
For compatibility with Windows systems, declare repository_data
files to always have LF line endings on checkout.

A trailing "/**" matches everything inside, with infinite depth.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Apply the updated api/pylintrc config to the client_rework
directory.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Run manually the black and isort code formatters over
the client_rework code.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Configure tox to run black and isort over the files
under client_rework directory.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Fix linter issues after applying the api/pylintrc config
over the client_rework/* code.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Temporary disable (inline) try-except-raise and broad-except
warnings in the new Updater code until client exception handling
is revised (theupdateframework#1312).

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Temporary disable (inline) undefined-loop-variable pylint
checks in the new Updater code until the download functionality
is revised (theupdateframework#1307).

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@sechkova
Copy link
Contributor Author

The commits are only reworded, no code changes.
I don't plan anything else for this PR, thanks all for the reviews.

@jku jku merged commit 2b9f838 into theupdateframework:experimental-client Mar 24, 2021
@sechkova sechkova deleted the client-rework branch March 25, 2021 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Related to the client (updater) implementation experimental-client Items related to the development of a new client (see milestone/8 and theexperimental-client branch)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants