-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest Format #321
Comments
Looks pretty good to me! Here are some of my initial thoughts. With manifest format, CBOR seems to have many options which you can choose from while MessagePack is more straightforward. As they seem achieve the same thing at the end of the day, maybe MessagePack would be a better choice as it's more simple? For what it's worth, MessagePack's implementations seems to be a little more up-to-date compared to CBOR's from a quick glance. And given CBOR's amount of options, I don't trust that all implementation have same amount of features implemented. On the hash function, I think BLAKE3 would def make a lot more sense if it's security is good enough. Haven't read the paper, but given BLAKE3's speed, it'd be a great fit for this manifest as you'd probably hash a lot of data.
Yeah, tree is probably the way to go. |
Thanks for the feedback! I definitely agree that going with something simpler is better. I should look at CBOR and see if any of the bells and whistles it provides make any sense for intermodal. |
Is there a particular reason for preferring binary formats over text? |
I think a few reasons:
Text formats definitely have the benefit of being human readable, and losing that is unfortunate. To make up for that, one of things we could do is auto-generate a human readable readme or .nfo file, that would contain all the information that would be useful for a human, and put that in the root of intermodal-created releases. |
I think for the use case of a content manifest, which is designed as basically just a bunch of file hashes, binary is probably reasonable -- the space savings are nontrivial, and the human-readable information in this file is relatively limited anyway. This should probably also be as stupid a format as possible; it strikes me that something as simple as the text output from sha256sum would be functional here, and anything more complicated would need good returns on the extra complexity. What use case do you envision for data files embedded in manifests? I'm somewhat confused as to what the benefit would be there. For the other file components, the calculus is potentially different. I think there's a better case for metadata being text-format, maybe TOML with a specified schema or similar, since it's information that's fundamentally designed for humans (rather than crypto algorithms) to consume. |
I'd like to have recursive maps and lists, otherwise extensions will be hard. I'd like to keep things simple, but still leave the door open for future extensions, and a flat list of hashes wouldn't leave room for extensions.
Digital signatures are one example, another is the inner nodes of a merkle tree, which would allow fast, secure random access into large data files.
I think the metadata manifest will be primarily produced and consumed by computer programs. For example, a program might build an index over a bunch of manifests, and then a human could search it for individual files. But, that doesn't preclude also generating a .nfo file or |
Many features are gated on the basic design of the Intermodal manifest. So let's get started on it right away.
Desiderata
Allow integrity checking. Given a manifest, an accompanying release can be checked for integrity using the manifest. This will require including secure hashes of accompanying files in the manifest.
Hashing the manifest should give a secure hash that uniquely identifies the contents of the release.
Multi-level manifest. A lower-level manifest should commit to the contents of a release. A higher -level manifest should commit to both the lower level manifest, as well as any files containing signatures over the lower level manifest. It would be nice to only have one manifest, but since data can't self-sign or contain hashes to itself, it seems necessary to have at least a two-level manifest so that we can produce a hash that uniquely identifies a collection of files, as well as signatures and other commitments to that collection of files.
I'm thinking about calling the lower-level manifest the "content manifest" and the lower-level "bundle manifest". I'm definitely open to naming suggestions though. Other ideas are "file manifest" and "root manifest".
Why not use BitTorrent metainfo?
BitTorrent v1 uses SHA1, which is insecure.
BitTorrent v2 uses a custom tree hash that is vulnerable to attack if the length of the content is not included.
Bencoding is not a particularly popular encoding format.
Why not use the web packaging format?
The web bundle format is a single-file format, so it would be impossible to use natively with BitTorrent, which is an important transport.
Out of Scope
To keep things simple, it would be a good idea to limit the scope of the initial manifest design as much as possible. Things that we should consider for the design, but not worry about the details:
Metadata. Structured metadata can be included in a file that the content manifest commits to.
Signatures, timestamps, and related functionality. A two-level manifest leaves open the ability to include files that are signatures over the hash of the content manifest, which are committed to by the bundle manifest.
In scope
Manifest format. I'm thinking either CBOR or messagepack. They are both lightweight, binary, schemaless formats with an object model that is similar to JSON. The web packaging format uses CBOR, so that's what I'm leaning towards. Keybase's saltpack, however, uses messagepack, so that's a contender too.
The hash function. Since manifests will have to include secure hashes, we should pick a hash function. I'm leaning towards BLAKE3, although someone could probably talk me out of it. BLAKE3 is extremely fast, supports random access and streaming verification, and has a strong rust implementation, all of which are nice features. On the downside, it is very new, and uses a reduced strength construction. However, that reduced strength is argued to not be vulnerable now or in the future.
Whether the manifest should be flat or a tree. Nested will be more compact when there are long directory names with many entires, but is more complex. Nested doesn't explicitly encode path separators, which I think is a bonus.
BitTorrent V2 uses a tree, so that's what I'm leaning towards.
Where to put the manifest in a release. Since it seems likely that we'll eventually want multiple files, I'm thinking that putting everything into a subdirectory is a good idea, either
imdl/
orintermodal/
.For example, if we go with CBOR, the structure could be:
Postscript
A very weird but nonetheless interesting choice of format would be FIDL, Fuchsia's IPC system:
Another less wacky choice would be flatbuffers. Flatbuffers also support zero copy and zero parse deserialization.
Misc
link
attribute, that would indirect through a hash.The text was updated successfully, but these errors were encountered: