Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata API: Document serialization "repro" issue #1800

Merged

Conversation

jku
Copy link
Member

@jku jku commented Jan 26, 2022

It's not obvious to casual reader that reading metadata and then
writing it might not always produce the same file. It's also not
immediately obvious why this matters.

Document both concepts.

Fixes #1392

Signed-off-by: Jussi Kukkonen jkukkonen@vmware.com

It's a little wordy but I think it may be useful.

@coveralls
Copy link

coveralls commented Jan 26, 2022

Pull Request Test Coverage Report for Build 1805825853

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.89%

Totals Coverage Status
Change from base Build 1805452455: 0.0%
Covered Lines: 1112
Relevant Lines: 1132

💛 - Coveralls

Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments look good.

I wonder do we want to add a comment somewhere that even though when deserializing and serializing we don't get byte-for-byte equivalence we do get a consistent signature no matter the whitespaces?

@jku
Copy link
Member Author

jku commented Feb 1, 2022

I can see that making the distinction could help... I'll try that (even if the paragraph is a bit long already)

@jku jku force-pushed the document-serialization-hash-issue branch 2 times, most recently from eb121f6 to ca9300c Compare February 1, 2022 15:04
@jku
Copy link
Member Author

jku commented Feb 1, 2022

force pushed:

  • rebased to get recent build fix
  • added the mention that signatures are guaranteed to stay valid in a deserialize-serialize cycle even if the file content itself is not guaranteed to stay the same.

Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 237 to 238
same because of whitespace issues, even if the signatures are
guaranteed to stay valid. If byte-for-byte equivalence is required
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A smart way to mention that signatures are guaranteed to stay valid.

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the clarification, modulo two nits:

  • the claim "not guaranteed to be the same" depends on the serializer, which is configurable for both to_file and to_bytes. Maybe it's worth mentioning that?
  • the reason "because of whitespace issues" is also specific to the default json serializer, and also only part of the problem (e.g. non-guaranteed dict key order is another problem). Maybe it's not necessary to mention a particular reason?

@jku
Copy link
Member Author

jku commented Feb 3, 2022

Agreed on both comments, thanks, will update.

It's not obvious to casual reader that reading metadata and then
writing it might not always produce the same file. It's also not
immediately obvious why this matters.

Document both concepts.

Fixes theupdateframework#1392

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
@jku jku force-pushed the document-serialization-hash-issue branch from ca9300c to 3f3b921 Compare February 7, 2022 10:18
@jku
Copy link
Member Author

jku commented Feb 7, 2022

I solved this by

  • not mentioning whitespace as a reason
  • just saying that serialization is not required to be byte-for-byte the same (the implication is that different serializers may make different decision, but that's not said)

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @jku!

@lukpueh lukpueh merged commit 70c7358 into theupdateframework:develop Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document Metadata API: MetaFile hash matching issue with serialization
5 participants