-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for next internet draft at time of charter review #7
base: main
Are you sure you want to change the base?
Conversation
index.xml
Outdated
Algorithm Registry</eref> for more context, and also the <eref | ||
target="https://github.com/multiformats/unsigned-varint">unsigned varint | ||
specification</eref> for an explanation of how these UTF-8 bytes are generally | ||
expressed and handled by current implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We describe varint in the I-Ds, we should probably point there for the normative reference. We might have to publish a spec on varint given that I couldn't find a stable reference for it.
https://www.ietf.org/archive/id/draft-multiformats-multihash-06.html#name-unsigned-variable-integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/multiformats/unsigned-varint
^ This is the only spec-- I think it has to be first order of business of the WG because it is, indeed, what multicodec entries are in (not UTF-8). my thinking, in terms of order of operations, is:
1. finalize unsigned_varint spec
2. multibase registry group RFC (recreates the multicodec as a registry group in IANA, whether with only final entries or empty) which defines entries as unsigned_varints
3. rewrite multibase to be one registry within that group, cleaning up all this confusing "first-draft" language about converting Unicode into UTF-8 to avoid collisions with the rest of multicodecs and address the NUL use-case a little better
~~4. then do multihash :D ~~
Wow pretty much all of that has been reversed in the last week! Upon further research it's just the unsigned form of the LEB128 standard from 1993 (Dwarfstd), and might not need to be specified anyways since it's really an implementation detail. The core multicodec table maintainers actually have been trying to move to canonicalizing the entire table as raw binary instead of varint anyways... so maybe the IETF version should just be raw binary from the beginning and leave the varint off the table.
Similarly, multibase is better off in its own registry, the stuff about collision-proofing it against the multicodecs won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a number of concerns with this PR that we should discuss before merging.
The varint
is just a subset of UTF-8 thing feels like a design change (it might be true, but I've never thought of it that way -- need to understand that design at greater depth).
It's not clear where the 8-bit null thing at the beginning should be used.
The transition to speak about Multibase as UTF-?? code points needs more work -- if we're going to make that change, we need to talk about the variety of different code points and corner cases that opens up. We should be absolutely sure we're not going to break implementations out there before doing that change. We would also need to discuss UTF-16 vs. UTF-8 and surrogate pairs.
It would be easier to discuss these things if the PR focused on one topic at a time.
All that said, these changes feel do-able, but they raise more questions about the spec and the design of Multiformats... and if implementations have actually implemented things in the way that these changes indicate.
I think this is best explained as an "opt-out" from multibase since CIDs and other artefacts usually get multibased LAST before leaving a binary-only context to be ready for the outside world of transports and text-based interfaces. In exceptional cases, where CIDs and other protobuf formats get exported to other context but want to "stay in binary" (such as export to CBOR-land), a nul-prefix corresponds to a "raw" tag, a side-door out of multibasing. Note, for example, how DAG-CBOR achieves CBOR interop by prepending NUL to a binary CID instead of a multibase prefix: I'll get more confirmations that this is actually the consensus view on how the libraries are actually multibasing (or not) before getting all this merged and update here when I'm 100% certain :D |
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language changes for clarity, and a couple questions that need answers
If an encoding seems plausible but does not yet fulfill all requirements, it can | ||
be registered with a `draft` status. In exceptional cases, consensus of the | ||
Stewards and Experts can excuse one of the above requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems both redundant and contradictory.
- What list of requirements need not be entirely fulfilled to allow registration as
draft
? The only list I see is the three numbered requirements "above", which "MUST" be fulfilled fordraft
registration. - For the "exceptional cases," which requirements are candidates for being "excuse[d]", and is that "excuse" relevant for
draft
or forfinal
status?
If an encoding seems plausible but does not yet fulfill all requirements, it can | |
be registered with a `draft` status. In exceptional cases, consensus of the | |
Stewards and Experts can excuse one of the above requirements. | |
If an encoding seems plausible but does not yet fulfill all requirements, it can | |
be registered with a `draft` status. In exceptional cases, consensus of the | |
Stewards and Experts can excuse one of the above requirements. |
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Unfortunately core stakeholders are on leave so review in time for IETF 117 in SF was not possible on my upstream PR, but here is a draft PR incorporating feedback so far. Rendered preview can be seen here, published off a fork.
Highlights:
Includes/closes PR Replace two "multihash" bits with "multibase" #5
Addresses/closes Follow the guidelines for creating an IANA registry #6
Addresses & closes Not all identifiers fit in a single byte #4 with new language on unicode being definitive versus binary
Address Review by Peng Shuping #3