Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for next internet draft at time of charter review #7

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

bumblefudge
Copy link

@bumblefudge bumblefudge commented Jul 21, 2023

Unfortunately core stakeholders are on leave so review in time for IETF 117 in SF was not possible on my upstream PR, but here is a draft PR incorporating feedback so far. Rendered preview can be seen here, published off a fork.

Highlights:

gen-algorithms-registry Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated
Comment on lines 169 to 172
Algorithm Registry</eref> for more context, and also the <eref
target="https://github.com/multiformats/unsigned-varint">unsigned varint
specification</eref> for an explanation of how these UTF-8 bytes are generally
expressed and handled by current implementations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We describe varint in the I-Ds, we should probably point there for the normative reference. We might have to publish a spec on varint given that I couldn't find a stable reference for it.

https://www.ietf.org/archive/id/draft-multiformats-multihash-06.html#name-unsigned-variable-integer

Copy link
Author

@bumblefudge bumblefudge Jul 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/multiformats/unsigned-varint
^ This is the only spec-- I think it has to be first order of business of the WG because it is, indeed, what multicodec entries are in (not UTF-8). my thinking, in terms of order of operations, is:
1. finalize unsigned_varint spec
2. multibase registry group RFC (recreates the multicodec as a registry group in IANA, whether with only final entries or empty) which defines entries as unsigned_varints
3. rewrite multibase to be one registry within that group, cleaning up all this confusing "first-draft" language about converting Unicode into UTF-8 to avoid collisions with the rest of multicodecs and address the NUL use-case a little better
~~4. then do multihash :D ~~

Wow pretty much all of that has been reversed in the last week! Upon further research it's just the unsigned form of the LEB128 standard from 1993 (Dwarfstd), and might not need to be specified anyways since it's really an implementation detail. The core multicodec table maintainers actually have been trying to move to canonicalizing the entire table as raw binary instead of varint anyways... so maybe the IETF version should just be raw binary from the beginning and leave the varint off the table.

Similarly, multibase is better off in its own registry, the stuff about collision-proofing it against the multicodecs won't work.

index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a number of concerns with this PR that we should discuss before merging.

The varint is just a subset of UTF-8 thing feels like a design change (it might be true, but I've never thought of it that way -- need to understand that design at greater depth).

It's not clear where the 8-bit null thing at the beginning should be used.

The transition to speak about Multibase as UTF-?? code points needs more work -- if we're going to make that change, we need to talk about the variety of different code points and corner cases that opens up. We should be absolutely sure we're not going to break implementations out there before doing that change. We would also need to discuss UTF-16 vs. UTF-8 and surrogate pairs.

It would be easier to discuss these things if the PR focused on one topic at a time.

All that said, these changes feel do-able, but they raise more questions about the spec and the design of Multiformats... and if implementations have actually implemented things in the way that these changes indicate.

index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
@bumblefudge
Copy link
Author

bumblefudge commented Jul 30, 2023

It's not clear where the 8-bit null thing at the beginning should be used.

I think this is best explained as an "opt-out" from multibase since CIDs and other artefacts usually get multibased LAST before leaving a binary-only context to be ready for the outside world of transports and text-based interfaces. In exceptional cases, where CIDs and other protobuf formats get exported to other context but want to "stay in binary" (such as export to CBOR-land), a nul-prefix corresponds to a "raw" tag, a side-door out of multibasing. Note, for example, how DAG-CBOR achieves CBOR interop by prepending NUL to a binary CID instead of a multibase prefix:
https://ipld.io/specs/codecs/dag-cbor/spec/#links

I'll get more confirmations that this is actually the consensus view on how the libraries are actually multibasing (or not) before getting all this merged and update here when I'm 100% certain :D

index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
bumblefudge and others added 7 commits August 7, 2023 10:12
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
index.xml Outdated Show resolved Hide resolved
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
Copy link

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language changes for clarity, and a couple questions that need answers

index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
Comment on lines +318 to +320
If an encoding seems plausible but does not yet fulfill all requirements, it can
be registered with a `draft` status. In exceptional cases, consensus of the
Stewards and Experts can excuse one of the above requirements.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems both redundant and contradictory.

  • What list of requirements need not be entirely fulfilled to allow registration as draft? The only list I see is the three numbered requirements "above", which "MUST" be fulfilled for draft registration.
  • For the "exceptional cases," which requirements are candidates for being "excuse[d]", and is that "excuse" relevant for draft or for final status?
Suggested change
If an encoding seems plausible but does not yet fulfill all requirements, it can
be registered with a `draft` status. In exceptional cases, consensus of the
Stewards and Experts can excuse one of the above requirements.
If an encoding seems plausible but does not yet fulfill all requirements, it can
be registered with a `draft` status. In exceptional cases, consensus of the
Stewards and Experts can excuse one of the above requirements.

index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
index.xml Outdated Show resolved Hide resolved
bumblefudge and others added 9 commits August 31, 2023 14:20
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Follow the guidelines for creating an IANA registry Not all identifiers fit in a single byte
3 participants