Skip to content
This repository has been archived by the owner on Jul 9, 2024. It is now read-only.

Draft 03 PR #58

Merged
merged 11 commits into from
Mar 1, 2022
Merged

Draft 03 PR #58

merged 11 commits into from
Mar 1, 2022

Conversation

bradleypeabody
Copy link
Contributor

@kyzer-davis Here is what I'm thinking in terms of a newer and simpler format:

  • The preamble combines some of the explanation in shorter bullet points
  • Each format is described as concisely as possible, omitting things like the duplicative explanation from RFC4122, or detailed explanations about various tradeoffs that are described later.
  • Each of the various topics are now covered in a UUID Concerns section and this addresses things like Monotonicity, Opacity, Variable Length, Text Format, etc. - each as their own topic. The idea here is to make the explanation of each UUID version very simple and easy to understand, and move each of the longer arguments about tradeoffs etc into their own thing so they can be covered separately, and read by the reader when they are actually interested in a given topic. Otherwise with the way it was before, it was kind of like you were forced to read through a long treatise about all kinds of factors before being able to just get a clear-cut explanation of what each of the fields mean and what belongs in them.

There are still various TODOs in there, which I plan on filling in later in the coming week. And a lot of the text under UUID Concerns is still pretty terse and note-like, but I think that can be fixed up as well without too much trouble. I also left the previous text down at the bottom as comments for easy reference in case I moved stuff out that we want to bring back.

But yeah, let me know what you think in terms of the overall approach with this.

@broofa
Copy link
Contributor

broofa commented Feb 14, 2022

Is there a reason for creating these as new documents rather than revising the existing ones? Seems like it'd be easier to review if we could see side-by-side changes in this PR.

@bradleypeabody
Copy link
Contributor Author

@broofa fixed

@broofa
Copy link
Contributor

broofa commented Feb 14, 2022

A UUID with all zero bytes is never valid

Isn't this the NIL UUID?

ver_var field

While I appreciate the simplicity of this approach, I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

@bradleypeabody
Copy link
Contributor Author

@broofa

Isn't this the NIL UUID?

Good point, yes that's the same thing. It's interesting that RFC4122 doesn't indicate the semantic meaning of a Nil UUID or when one would use it. Seeing that there is no specified use for it, I think it makes sense to recommend a check for Nil UUID as a valid substitute for "is this a valid UUID", in the interest of discouraging unnecessary UUID introspection. Let me know if you think otherwise.

I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

We discussed this here but I don't feel like there is a convincing-enough argument against combining these into one field. I'll drop another comment on that ticket so we can carry on the discussion further there if needed.

Other than that, what do you think of the layout overall? Does it seem simpler and more approachable?

@kyzer-davis
Copy link
Contributor

I am all for being streamlined in getting the need-to-know up front then discussing some of the finer details at the end.

I knocked out a few things today:

  • Added HTML version
  • Updated changelog calling out variant change
  • Did a pass on "Combined Variant and Version Fields" and changed the section name to just "Variant and Version Fields"
  • Filled out some of the diagram blocks
  • Removed bullet on all zero UUID as per @broofa's comment about NIL UUID
  • Misc. Typo fixes

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 16, 2022

Work on 2-16-2022 (ignore the 2021 typo in the commit...):

  • More Changelog updates to summarize at-a-glance changes from Draft-02 vs Draft-03
  • Added var back to UUIDv6 ASCII layout as this is an important set of bits
  • Verified <figure><artwork></artwork></figure> tags are valid for the code snipets as per IETF RFC7749
  • Added code snippet placeholder for v7
  • Added Test Vector ASCII placehold table to v6 and v7
  • Modified v7 ASCII layout to have rand_a and rand_b (before and after var_ver) to make it easier to discuss the relevant section of bits.
  • Modified v8 with the same logic of custom_a and custom_b for the pre and post var_ver bits
  • Removed <section> tags from "Implementation Tradeoffs" section so that it shows up under "UUID Concerns" and avoids an empty section.
  • Hit most of the "TODO link section below" references plus added a few backward and forward references with <xref target="sectionLink"/>
  • Gave another pass to "UUID Concerns" fleshing out the text to be less informal.
  • Added more TODO's for tracking
  • Added some further Acknowledgements

@broofa
Copy link
Contributor

broofa commented Feb 17, 2022

recommend a check for Nil UUID as a valid substitute for "is this a valid UUID"

Let's not mess with the verbiage around the Nil UUID.

  1. It is defined in the RFC. Ergo, it's valid. (e.g. uuidjs permits it in validation, and also provides a NIL constant.)
  2. If it's expected to be invalid, there's no need to address it as a special case. 0b00 is not a valid variant. That this is in the spec suggests there is significance to it beyond being simply a special invalid value.
  3. Even if it's arguably invalid, specifying semantics for it now will almost certainly cause existing uses fall afoul of whatever definition we put forward.

... for example, it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages).

Copy link
Contributor

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments / questions. Didn't quite get to the end, but I have to get to work.

draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
@broofa
Copy link
Contributor

broofa commented Feb 17, 2022

BTW, what happened to the verbiage around the sub-second timestamp field being encoded in fractional-binary notation?

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 18, 2022

Summary of work for 2-17-2022

  • Ran the document through spell check so hopefully @broofa doesn't beat me up anymore about spelling issues :)
  • Noticed we reference Octet 9 for variant bits, this is actually octet 8 as per verbiage where this document and RFC4122 start definitions with octet 0. Fixed this in the the entire document.
  • Added further clarifications on Variant 111+0 and 111+1
  • Added Format section and put 3x UUID layouts in that to reduce empty sections
  • Added IETF terminology section moved/updated Requirements Language and added new Abbreviations section
  • Moved C code samples to first appendix with two sub-sections as per RFC 7749.
  • Moved Test vector samples to second appendix with two sub-sections as per RFC 7749.
  • Knocked out most of the items @broofa recommended, closing the recommended changes.
  • Added more forward/reverse references where required.
  • Deleted the old XML RFC comment, Draft 02 can be found in the repo.
  • Moved some TODOs to comments to promote readability by editors while we iron out the TODO text.
  • Changed all "#-bit" references to "# bit" as per Draft 02 update about the same topic.
  • Linked v7: Restrict use of the term "node" to version 6 #49, v7: Change "clock sequence" to "counter" #51 , Quantum-mechanical TRNG or CSPRNG is a must #55 which should be sufficiently covered currently.

Edit: still a ton left to do. I will resume work on Draft 03 Monday.

@bradleypeabody
Copy link
Contributor Author

@broofa

it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages)

I definitely agree with this. I'll update the language to remove mentions of validation and essentially say the above. The point of the earlier text is that I think UUID validation is generally unnecessary (because everywhere else on the internet when someone generates an ID, the thing storing it just says "thanks" not "this ID is in an invalid format" - I think it's the complexity of RFC4122 that causes people to feel that validation is necessary when in fact I think it's just unneeded complexity that serves practically no purpose). But yeah, just saying "If you need to express the concept of 'there is no UUID here', e.g. database NULL, you can use the Nil UUID" - that totally works.

Regarding variable length and text formats, let's move that to the separate issues, I'll tag you and Fabio there and we can hash those points out.

@bradleypeabody
Copy link
Contributor Author

@broofa

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

Yeah this is one of the reasons I wanted to get this new draft being looked at.

Based on our discussion about typical clock resolutions, and my experience just trying to explain the concept to people, I changed my position on this idea of variable time length encoding to be not worth the effort. Such a scheme is still possible in UUIDv8 if it's really needed. But basically most of the people I talked to about it sort of just scratched their heads and went "hm, uh I guess so, I sort of think I get it". Neither my explanation, nor the need for such a system seemed to proove useful after the shopping the idea around. So I'm okay to just get it over it and go with a milliseconds timestamp. It will be a lot easier for folks to implement, and I think serves the purpose quite well.

@LiosK
Copy link

LiosK commented Feb 19, 2022

I kinda miss and would appreciate the smart subsec fraction idea, but the choice of timestamp encoding might be a trivial issue because, for a given size of binary timestamp field, the same information can be placed regardless of encoding techniques. R.I.P.

@broofa
Copy link
Contributor

broofa commented Feb 20, 2022

I'm okay to just get it over it and go with a milliseconds timestamp

At the risk of horsing a dead beat...

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44? 44 bits would suffice for dates to year 2527. I only mention this because the current verbiage ("big endian") means the first 6 high-order bits won't be used for nearly a century. The first 4 for 500 years.

Seems like a waste. E.g. a 44 bit unix_ts_ms field layout...

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      unix_ts_ms       |                  rand_a               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    var_ver    |            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

@fabiolimace
Copy link

fabiolimace commented Feb 20, 2022

A 44 bits millisecond field gives 500 years of lifespan. It's more than enough in my opinion. At the end of this lifespan, we probably won't be using UUIDs anymore, unless something goes wrong along the way (Madmax, Skynet, etc).

The remaining 20 bits in rand_a can be used for:

  • a counter up to 2^20 (1048576) (monotonicity); or
  • submillisecond up to nanosecond (precision); or
  • just random bits (entropy).

So I agree that maybe 48 bits is too much.

P.S. a 20-bit counter can gives us 1 billion monotonic UUIDs per second instead of just 65k. UUIDv1 gives us 10 million.

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 22, 2022

@broofa

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44?

This is just where the logical previous ver bits would start. 48 was a good value when we had to splice in the 4 bit version but I agree, if we are going with the new variant+version position then we can do whatever we want with the first 64 bits.

I am totally fine to drop this down to 44 bits and give some bits back to rand_a though. I will address this in the next commit.

Edit: and ULID uses 48 bits for timestamp.

@kyzer-davis kyzer-davis changed the title Reworked and simplified Draft 03 PR Feb 23, 2022
Copy link
Contributor

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, and one big ask (remove var_ver concept until we've definitively concluded it's the right approach.)

draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
Comment on lines +381 to +382
For UUID versions 7 and 8, the variant field has been incremented and
the version field has been moved to the same octet as the variant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Guess I'll make a nuisance of myself by asking that we revert this var_ver field concept until #26 has been resolved.

As I said, I feel it's incumbent upon @bradleypeabody to make a better case for why a new variant is needed, so I'd prefer it if we didn't just default to this in the event that conversation peters out.

draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
draft-peabody-dispatch-new-uuid-format-03.txt Outdated Show resolved Hide resolved
@bradleypeabody
Copy link
Contributor Author

On some of these recent changes, I think we're loosing some important aspects that work together:

  • 48 bit timestamp - admittedly this is was chosen because it's a clean byte-boundary and is easier to use. If we change it to 44 bits, it just that much more bit swizzling required. This goes along with the var-ver change, moving the version bits out of the way.
  • This works in tandem with the variable length approach. If you need more bytes for entropy, just add them. Some of the questions brought up around this are valid and I can go through and answer those, and possibly we can limit the scope of when the doc suggests to use longer IDs, but I think there is still a strong case for including them. There has been a huge amount of prior discussion about entropy and "are there are enough bits" for various cases and degrees of collision resistance.
  • The crockford base32 text format was in there because not having a compact text form defeats of the most important use cases here which is making an ID which is useful for databases. Having an overly long text form for IDs just for legacy reasons is not a good tradeoff for this case of database implementing PKs as UUIDs. If the problem is the poor specification and unclarity of the alphabet, I think we can specify a subset that is compatible with a typical crockford base32 implementation but also strict enough to make the minimum implementation simple.

@fabiolimace
Copy link

fabiolimace commented Feb 23, 2022

The only objection I have against moving the version number closer to the variant is that it reserves 2 more bits.

Perhaps it is more beneficial, in terms of entropy gain, to remove the version number altogether and just create a versionless 'E' variant. But it would exclude what we now call UUIDv8.

Alternatively, we could create an 'E' variant plus 1 bit (or 2 bits) to separate the key spaces for UUIDv7 from UUIv8. But it would require other names for them.

EDIT: the metaphor was removed from this comment for not being objective.

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 24, 2022

There is a single TODO!

@bradleypeabody I need a C code snippet for UUIDv7 in the appendix!

Work from today:

  • Reverted UUIDv7 to 48 bits. UUIDv7 is @bradleypeabody's and I defer changing that value to him.
  • Reverted UUIDv8 back to "anything goes" vs "time-specific" as I missed an email from Brad stating this was the goal in Draft 03.
  • Re-ordered "UUID Best Practices" sub-sections (Fleshed out text in every one of these sections.) @broofa I saw you reviewing this today, it was not complete but should be in a better place after this commit.
  • Added v7: Add example layout that employs exotic precision timestamp #57 as an example code snippet for UUIDv8 in the appendix.
  • Fixed var_ver typo in UUIDv8 test vector. Thanks @fabiolimace

Edit:
#53 should now be covered in "Monotonicity, Counters" section.
Edit 2:
#36 should also be covered by "Distributed UUID Generation" and "Uniqueness Guarantees" sections

@kyzer-davis kyzer-davis linked an issue Feb 24, 2022 that may be closed by this pull request
@kyzer-davis kyzer-davis linked an issue Feb 24, 2022 that may be closed by this pull request
@bradleypeabody
Copy link
Contributor Author

@kyzer-davis Great!

UUIDv7 generation in C:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
	printf("fopen /dev/urandom error\n");
	return 1;
}

// ...

// generate one UUIDv7
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
	printf("clock_gettime error: %d\n", ret);
	return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

printf("tms: %lld\n", tms);

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0xE7; // set var-ver field

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 25, 2022

Today's changelog:

At this point I do not have any additional items to change and all of the TO DOs are gone.
I have been combing through the issue tracker and do not see any other additional items from the threads to hit in this PR. Please bring them to my attention if I have missed something important.

I will return Monday to check this thread and work on any required items called out to me.

Cheers,

Edit:
I hope the IETF folks don't ask us bring back the RFC4122 style "basic algorithm" sections.
I am a fan of omitting them entirely as we have in Draft 03 but if somebody feels otherwise please let me know.

@broofa
Copy link
Contributor

broofa commented Feb 25, 2022

At this point I do not have any additional items to change and all of the TO DOs are gone... Please bring them to my attention if I have missed something important.

Where are we at with #26? I'm concerned that our rationale for changing the variant won't hold up to scrutiny once this spec is published.

BTW, have we reached out to any of the original RFC authors for feedback on all of this? (mmealling and richsalz appear to be on github). I'm curious what they intended for "future uses" of the 0b111 variant.

@bradleypeabody
Copy link
Contributor Author

bradleypeabody commented Feb 25, 2022

BTW, have we reached out to any of the original RFC authors for feedback on all of this?

This is a good point. I think richsalz was involved in some of the early discussion on the IETF mailing list, IIRC, but that discussion was years ago now and the idea has of course changed a lot since.

Getting the 03 draft to the IETF is a good way to get feedback. Most of the early IETF mailing list discussion culminated in (to parapharse) "just make a draft, so we can see what you're talking about in detail".

@kyzer-davis
Copy link
Contributor

Thanks to everybody for helping review the current Draft 03!

I may be the one editing the XML file but the groups feedback is beyond invaluable!

Today's changelog:

@kyzer-davis kyzer-davis merged commit d424fed into master Mar 1, 2022
@kyzer-davis kyzer-davis deleted the refactor-feb-2022 branch March 1, 2022 16:34
@ben221199
Copy link

ben221199 commented Mar 1, 2022

#26 (comment)

@kyzer-davis kyzer-davis mentioned this pull request Mar 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.