Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make length and hashes optional for timestamp and snapshot roles #1031

Merged
merged 12 commits into from
Jul 30, 2020

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented May 13, 2020

Please fill in the fields below to submit a pull request. The more information
that is provided, the better.

Fixes issue #: #996

Description of the changes being introduced by the pull request:
As per the tuf specification (v1.0.1) length and hashes fields are optional for
the timestamp.json and snapshot.json.

There could be a discussion around what should be the default.
My opinion is that we should apply the rule security by default here and use hashes and length by default.
That way if somebody decides to not use them then he/she should make it explicit through passing the use_length=False and use_hashes=False arguments when calling the respective functions.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@MVrachev MVrachev changed the title Length hashes optional Make length and hashes optional for timestamp and snapshot roles May 13, 2020
Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks, Martin!

tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Show resolved Hide resolved
tuf/repository_lib.py Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

MVrachev commented May 14, 2020

Thank you, @trishankatdatadog for your comprehensive review!

I decided to rebase the particular commits, address your comments, and make them ready for merge.
The changes I made are the following:

  1. Used length = (use_length and length) or None when calculating the hashes and length in both generate_snapshot_metadata and generate_timestampt_metadata.

  2. Add code in generate_snapshot_metadata function to optionally calculate the hashes and length of the targets file.

  3. After step 2 the tests were failing and I had to start using targets_filename variable as a full filename with its suffix .json and that's why I created an additional commit.

  4. Created _setup_generate_snapshot_metadata_test and _setup_generate_timestamp_metadata_test functions to reuse setup logic in the generate_snapshot_metadata tests and generate_timestampt_metadata tests.

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks Martin!

I have one major reservation: the FILEINFO_SCHEMA between timestamp/snapshot and targets metadata should be distinguished...

tuf/formats.py Show resolved Hide resolved
tuf/formats.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

I will fix my merge conflicts when the tests on the develop branch are fixed and pass.

@MVrachev
Copy link
Collaborator Author

I just found out that the tests on the development branch are actually passing, but I had problems when I run tox -e py37 locally.
It seems that tox has cached an older version of securesislib - 0.14.2 and it was using it instead of the latest securesyslib version 0.15.0 when running tox -e py37.

Sorry, it was my problem.

@MVrachev
Copy link
Collaborator Author

MVrachev commented May 21, 2020

I fixed the merge conflicts.

This commit is big and contains the following changes:

  1. Rename FILE_INFO_SCHEMA to TARGETS_FILEINFO_SCHEMA

  2. Add METADATA_FILEINFO_SCHEMA used from snapshot and timestamp

  3. Rename LOOSE_FILEINFO_SCHEMA -> LOOSE_TARGETS_FILEINFO_SCHEMA

  4. Rename make_targets_fileinfo and use it when targets metadata have to be calculated and length and hashes are mandatory

  5. Create make_metadata_fileinfo function which will be used for snapshot and timestamp and fix all places that are calling it.

  6. Rename get_metadata_fileinfo to get_targets_metadata_fileinfo because it's used only in the _generate_targets_fileinfo where hashes and length are mandatory.

  7. Create unit tests for make_metadata_fileinfo

The code is ready for another review.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continued diligent work on this Martin. There are two important changes we should make (see below) and some minor stylistic changes that would be good to see (comments in-line).

Because FILEINFO_SCHEMA was versatile in its use for both target files and metadata files, it included the optional fields used by both the METAFILES and TARGETS objects defined in the specification.

As we are wisely separating duties here, we can remove the fields from our new objects that don't apply. Specifically:

  1. TARGETS_FILEINFO_SCHEMA doesn't need a version field
  2. METADATA_FILEINFO_SCHEMA doesn't need a custom field

tests/test_formats.py Outdated Show resolved Hide resolved
tests/test_formats.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_tool.py Outdated Show resolved Hide resolved
@trishankatdatadog
Copy link
Member

Thanks for all your hard work, guys! Please let me know when it's ready to review, and I'll take another look

@MVrachev
Copy link
Collaborator Author

I think the pr should be in good shape now and ready for another set of reviews.
I addressed all your suggestions @joshuagl.

@trishankatdatadog
Copy link
Member

I'm happy if @joshuagl is happy

@MVrachev
Copy link
Collaborator Author

I have only edited one comment on which Joshua commented, but I had forgotten to change it.

I don't have a clue why the Travis CI build for python 2.7 fails when locally it doesn't...

@lukpueh
Copy link
Member

lukpueh commented May 28, 2020

I don't have a clue why the Travis CI build for python 2.7 fails when locally it doesn't...

Could be some timeout. I've seen this every now and then in the TUF tests. Just restarted the job.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your continued diligence on this PR @MVrachev. I have a few minor nits (indentation related and some clarifications) and suggest a couple of clean-ups that would be great additions to this PR.

tests/test_formats.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_tool.py Outdated Show resolved Hide resolved
tuf/formats.py Show resolved Hide resolved
def generate_snapshot_metadata(metadata_directory, version, expiration_date,
targets_filename, storage_backend, consistent_snapshot=False,
repository_name='default'):
repository_name='default', use_length=True, use_hashes=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to maintainers, this is a behaviour change that we should document for the next release. In the current release

snapshot metadata always excludes length and hashes of (delegated) targets metadata
from #996

cc: @lukpueh

tuf/repository_lib.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

MVrachev commented Jun 8, 2020

I addressed Joshua's comments and add a two commits Remove redundant targets_filename argument and Fix snapshot_filename inconsistency usage.

The pr is ready for yet another set reviews.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking at this Martin. This is looking really solid. I have only one non-trivial concern, which is the change to generate_timestamp_metadata – I really don't think the snapshot_file_path argument is necessary. All we really want is the filename, which you get in your changes here by calling os.path.split(), and that is defined by the constant SNAPSHOT_FILENAME – let's just keep using the constant as the code does today and drop the argument to generate_timestamp_metadata().

tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tests/test_repository_lib.py Outdated Show resolved Hide resolved
tuf/formats.py Outdated Show resolved Hide resolved
tuf/formats.py Outdated Show resolved Hide resolved
tuf/formats.py Outdated Show resolved Hide resolved
Comment on lines 1601 to 1764
_, snapshot_filename = os.path.split(snapshot_file_path)
# Retrieve the versioninfo of the Snapshot metadata file.
snapshot_version = get_metadata_versioninfo('snapshot', repository_name)
snapshot_fileinfo[SNAPSHOT_FILENAME] = \
tuf.formats.make_fileinfo(length, hashes, version=snapshot_version['version'])
snapshot_fileinfo[snapshot_filename] = \
tuf.formats.make_metadata_fileinfo(snapshot_version['version'],
length, hashes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this necessary? The function worked before by using the constant SNAPSHOT_FILENAME why would do we need to change away from that? I don't think there's any need for the snapshot_file_path argument. AFAICS the one place that wasn't passing in 'snapshot.json' was passing 'path/to/snapshot.json' which your change here will strip down to 'snapshot.json'. Am I misunderstanding?

Copy link
Collaborator Author

@MVrachev MVrachev Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_file_path is needed to calculate the hashes and length of the snapshot file by using get_file_details from securesyslib: https://github.com/secure-systems-lab/securesystemslib/blob/9b3a78e998412c39418efb29f411591b20cdd236/securesystemslib/util.py#L45
which has the argument filepath - the absolute path to the file and in our case to the snapshot file.

Here we can't remove the snapshot_file_path as we did with the target_filename in the generate_snapshot_metadata because in generate_snapshot_metadata we have the extra metadata_directory argument from which we can use to create the absolute path needed for get_file_details.

tuf/repository_lib.py Outdated Show resolved Hide resolved
@MVrachev MVrachev force-pushed the length-hashes-optional branch 2 times, most recently from dbd8242 to e93dda8 Compare June 15, 2020 14:30
@MVrachev
Copy link
Collaborator Author

I had to rebase the second time to remove a commit not made by me which introduced into the changes.

@MVrachev MVrachev force-pushed the length-hashes-optional branch 2 times, most recently from be3c0af to 52f2fc6 Compare June 15, 2020 17:19
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your perseverance with these changes @MVrachev.

I realised that in generate_snapshot_metadata() we are computing lengths and hashes for all targets metadata files, regardless of whether we are using lengths and hashes or not. This doesn't seem like a major issue until you consider that the PEP 458 implementation will have over 16k delegations (and therefore over 16k targets metadata files to compute the hashes for).

I anticipate that Warehouse will want to use length and hashes for targets metadata, but we should make sure that if they opt out we aren't doing a lot of unnecessary work. Happy to see that as a patch here or we can file it as an Issue to be implemented in a separate PR before the next release. What do you think?

Comment on lines 128 to 131
{'length': 1024,
'hashes': {'sha256': 'A4582BCF323BCEF'},
'version': 1}),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation is not right here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by "not right".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary was originally indented to align with the opening delimiter on the line above (PEP 8 style). With the new longer variable name the indentation no longer matches the existing implicit coding style, nor does it match the explicit coding style of the project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now I get it better.
I was confused that you left the comment in a random location where I placed a new line.

tuf/formats.py Outdated Show resolved Hide resolved
tuf/repository_lib.py Outdated Show resolved Hide resolved
@MVrachev
Copy link
Collaborator Author

MVrachev commented Jul 24, 2020

I will add your suggestions, but if you don't mind without the abbreviations of B/W and w/o because they are little ambiguous to me as a non-native English speaker.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev
Copy link
Collaborator Author

I updated the last commit with your suggestion on all places where we used flags to control the computation snapshot length and hashes.

@trishankatdatadog
Copy link
Member

One minor comment: please don't force-push unless absolutely necessary, makes it hard to see new changes...

@joshuagl joshuagl merged commit b9f7cb3 into theupdateframework:develop Jul 30, 2020
@joshuagl
Copy link
Member

Thanks for all of your work on this @MVrachev and the thorough review @trishankatdatadog !

@MVrachev MVrachev deleted the length-hashes-optional branch August 4, 2020 10:27
MVrachev added a commit to MVrachev/tuf that referenced this pull request Nov 25, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Nov 25, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Nov 25, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Nov 25, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Dec 1, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Dec 1, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Dec 14, 2020
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jan 5, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jan 18, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jan 20, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Mar 31, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 6, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 6, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 28, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 28, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 28, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

This is a possible breaking change.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 29, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Apr 29, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
MVrachev added a commit to MVrachev/tuf that referenced this pull request May 10, 2021
As per the specification (v1.0.1) length and hashes fields
in timestamp and snapshot metadata are optional.
We have implement this in the older API
(see theupdateframework#1031) and we should
implement it in the new API.

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants