Skip to content
This repository has been archived by the owner on Jul 9, 2024. It is now read-only.

Discussion: Redefine variant bit (111) definition #26

Open
kyzer-davis opened this issue Aug 13, 2021 · 56 comments
Open

Discussion: Redefine variant bit (111) definition #26

kyzer-davis opened this issue Aug 13, 2021 · 56 comments
Labels
Discussion Further information is requested Out of Scope Topics not in scope for the RFC/Draft

Comments

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Aug 13, 2021

Continuation of separate #24 thread

Question:
Should we redefine UUID variant bits 111 (E/F) which are currently "Reserved for future definition." as per the original RFC 4122 Section 4.1.1 source?


Proposal:

  • In Draft 02 set definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID.
  • With that precedent set UUIDv6 could be converted to UUIDv1 (0001) + Variant E/F (111) as a method to signal the alternative encoding for UUIDv1 labeled as UUIDv1ε.

Possible text changes depending on feedback from this issue and #24

As such the Draft 01 goes from the current three definitions:

Name Version Variant Description
UUIDv6 0110 10x (8/9/A/B) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address
UUIDv7 0111 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and fractions (38 bits allocated to subsecond precision).
UUIDv8 1000 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order

To the possible four definitions in Draft 02:

Name Version Variant Description
UUIDv1ε 0001 111 (E/F) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address. (What is UUIDv6 in draft 01)
UUIDv6 0110 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and binary fractions (38 bits allocated to subsecond precision). (What is UUIDv7 in draft 01)
UUIDv6ε 0110 111 (E/F) UUIDv6 with 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds (30 bits allocated to subsecond precision). (Did not exist in draft 01)
UUIDv7 0111 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order. (What was UUIDv8 in draft 01)
UUIDv8 1000 10x (8/9/A/B) Goes away in draft 02 as it is no longer required.
@nerg4l
Copy link

nerg4l commented Aug 13, 2021

Could you clarify for me what does "variable subsecond encoding up to nanoseconds using floating point math and fractions" and "variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds" mean or what is the difference between them?

@kyzer-davis
Copy link
Contributor Author

@nerg4l, the topic being discussed in #24 if we kept the current UUIDv7 and just changed it to UUIDv6 (since UUIDv6 becomes UUIDv1ε). Then we can also add an alternative encoding that uses the 30-bit variant without any floating point math and binary fraction encoding as UUIDv6ε. We get the best of both words. UUIDv7 becomes what was UUIDv8 and I drop UUIDv8 from the draft.

I edited the table to add more clarity.

@bradleypeabody
Copy link
Contributor

@kyzer-davis I was thinking, for simplicity, we would only define a meaning for the variants + versions that we actually want. Meaning basically that UUIDv6 I think stays as it is with the old variant, and UUIDv7 and 8 use the new 111 variant. Otherwise I think we have too much variation without any real benefit.

@nerg4l
Copy link

nerg4l commented Aug 13, 2021

Thanks for the clarification.

I don't think it makes sense to create UUIDv6 and UUIDv6ε. Instead of pleasing everyone we should have a clear decision on which one to have. This would simplify implementations by having one less UUID to implement and would help keeping the RFC less complex. Also UUIDv6ε would probably only apply to nanosecond precision.

@nerg4l
Copy link

nerg4l commented Aug 15, 2021

I looked more into this and found two things which should be taken into consideration.

I'm not sure if it is relevant in case of extending the RFC but ITU also has a UUID definition in X.667. Which states the following:

11.2 All UUIDs conforming to this Recommendation | International Standard shall have variant bits with bit 7 of
octet 7 set to 1 and bit 6 of octet 7 set to 0. Bit 5 of octet 7 is the most significant bit of the Clock Sequence and shall be
set in accordance with 12.4.

NOTE – Bit 5 is listed here as a variant bit because its value distinguishes historical formats. Strictly speaking, it is not part of the variant value for this Recommendation | International Standard, which uses only two bits for the variant.

I also checked how variant#0 (NCS) UUID looked like. It seems, it does not have a version bit. https://opensource.apple.com/source/CF/CF-299.35/Base.subproj/uuid.c.auto.html

 * Internal structure of variant #0 UUIDs
 *
 * The first 6 octets are the number of 4 usec units of time that have
 * passed since 1/1/80 0000 GMT.  The next 2 octets are reserved for
 * future use.  The next octet is an address family.  The next 7 octets
 * are a host ID in the form allowed by the specified address family.
 *
 * Note that while the family field (octet 8) was originally conceived
 * of as being able to hold values in the range [0..255], only [0..13]
 * were ever used.  Thus, the 2 MSB of this field are always 0 and are
 * used to distinguish old and current UUID forms.
 *
 * +--------------------------------------------------------------+
 * |                    high 32 bits of time                      |  0-3  .time_high
 * +-------------------------------+-------------------------------
 * |     low 16 bits of time       |  4-5               .time_low
 * +-------+-----------------------+
 * |         reserved              |  6-7               .reserved
 * +---------------+---------------+
 * |    family     |   8                                .family
 * +---------------+----------...-----+
 * |            node ID               |  9-16           .node
 * +--------------------------...-----+

Unfortunately, I could not find anything specification about Microsofts' variant#2.

A lot of people refer to UUDs defined by RFC 4122 as variant#1 version x UUIDs. RFC4122 also states the following about variant and version:

[...] The UUID format is 16 octets; some bits of the eight octet variant field specified below determine finer structure. [...]

[...] As such, it [variant] could more accurately be called a type field; we retain the original term for compatibility. [...]

[...] The version is more accurately a sub-type; again, we retain the term for compatibility. [...]

There for, I assume using variant#3 would allow to redefine the the structure entirely. Moving or removing version from the definition for example. Probably, current implementations of RFC4122 should be ignored because keeping BC looks impossible.

@edo1
Copy link

edo1 commented Aug 15, 2021

I assume using variant#3 would allow to redefine the the structure entirely. Moving or removing version from the definition for example

It is the last unused variant, so there is no room for error.

@fabiolimace
Copy link

fabiolimace commented Aug 15, 2021

I also think the version bits (subtype) are specific to the RFC4122 variant (type), which has many subtypes that must be separated from each other. Variant 3 (111) doesn't even have a structure yet.

This file appears to be the basis for Apple's implementation: https://github.com/BeyondTrust/pbis-open/blob/master/dcerpc/uuid/uuid.c

@broofa
Copy link
Contributor

broofa commented Aug 18, 2021

Using the variant field to signal different bit semantics within RFC 4122 versions is not appropriate. The variant field is the overarching field that dictates layout and semantics of all other bits in a UUID. RFC4122 is very deliberately scoped to just variant == 0b10x. Hell, version isn't even defined outside of that specific variant.

I believe a more correct approach would be to use a different version to distinguish between timestamp encodings, much the way v3 and v5 distinguish between namespace hash algorithms.

And, yes, this would effectively double the number of new UUID versions being proposed, which is not ideal. This is one reason I'd like to see this proposal culled back to just 1 new timestamp version, per #30.

@edo1
Copy link

edo1 commented Aug 19, 2021

Using the variant field to signal different bit semantics within RFC 4122 versions is not appropriate.

What about using variant=0b111 in a new format (without version)? Expanding the variable part by three bits reduces the probability of collisions by almost an order of magnitude.

@broofa
Copy link
Contributor

broofa commented Aug 19, 2021

What about using variant=0b111 in a new format (without version)? Expanding the variable part by three bits reduces the probability of collisions by almost an order of magnitude.

You mean the version part? I suppose you could do that. But imho, defining a new variant should be a Big Deal™. It should be motivated by the need for a whole new class of UUIDs, or by having exhausted the available version options, which we haven't done yet. E.g. if there was a need to move the version field to the end of the UUID (to improve db-locality?), or it needed to be 6 or 8-bits wide instead of 4.

I don't see such a need at this time.

@edo1
Copy link

edo1 commented Aug 19, 2021

But imho, defining a new variant should be a Big Deal™

Agree.
There were no sortable UUIDs in the standard. Is this a Big Deal™?
Seriously though, I want the random part to be as large as possible.

@bradleypeabody
Copy link
Contributor

I tried to lay it out here https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md as best I could. But the idea is if the variant field is set to 0b111, this would mean the version field fits in the bottom (least significant) bits of the 9th byte (so var and ver are in this one byte). This technically loses us only 1 bit - since variant was 3 bits, and version was 4 but now we using 8.

I agree that we should not try to make new variants of v1, v4, etc. or v6 for that matter (since its goal is to be easily adaptable from and as close to v1 as possible). But I think we can use it in v7 and v8 to simplify the bit layout so there's just one byte you have to worry about when determining version info for v7 and v8:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |  ver    |                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

I'm hoping this can help move things toward greater simplicity in the spec/proposal.

@broofa
Copy link
Contributor

broofa commented Aug 20, 2021

I'm hoping this can help move things toward greater simplicity in the spec/proposal.

Before After
RFC4122 is variant 0b10x RFC4122 is variant 0b10x or 0b111
2122 UUIDs per version 2122 or 2120 UUIDs per version
version in bits 48-51 version in bits 48-51 or bits 67-70

This is not "greater simplicity". And the rift this creates in how versioning works is going to be an ongoing pain in the ass to articulate and rationalize about.

"Look for versions 1-5 here if variant is 0b10x, and versions 6-31 there if variant is 0b111. What's that...? What if variant is 0b111, but the version field < 6? Uh... well... that's not really a thing. I mean, it's technically possible, but we didn't want to confuse people by having different versions with the same name."

Now that I think about it, that last part about 0b111 versions 1-5, is actually pretty awful. That those particular variant-version combinations are possible just seems ripe for confusion and abuse.

@broofa
Copy link
Contributor

broofa commented Aug 20, 2021

In Draft 02 set definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID.

This is clearly overkill. Version 1 is the only version where there is any value in rearranging bit layout.

@bradleypeabody
Copy link
Contributor

bradleypeabody commented Aug 20, 2021

the rift this creates in how versioning works is going to be an ongoing pain in the ass to articulate and rationalize about.

The logic I'm proposing in https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md would be simply:

For UUID7 bytes[9] == 0xE7, for UUID8 bytes[9] == 0xE8.

And for UUID versions <= 6 do what RFC4122 indicates.

That's it.

It is a difference, but I don't think it's overly complicated and confusing.

Now that I think about it, that last part about 0b111 versions 1-5, is actually pretty awful.

I agree, we should just disallow other uses of the 0b111 variant, since I don't think there's any real benefit to allowing it.

@broofa
Copy link
Contributor

broofa commented Aug 20, 2021

The logic I'm proposing...
... I don't think it's overly complicated and confusing

I understand what you're proposing. And I agree that, had this been the bit layout RFC4122 used from the beginning, it would be simpler than what we have now. But it's not. It's a new layout in addition to what the original RFC specifies, so the complexity it introduces is in addition to the complexity that's already there.

I agree, we should just disallow other uses of the 0b111 variant, since I don't think there's any real benefit to allowing it.

I think you're missing the point. It's not about what exactly we say around this issue (although we'll have to make some decisions there), it's that we have to say anything at all. This change causes a certain amount of cognitive dissonance that can be avoided altogether if we just stick with the current scheme.

@bradleypeabody
Copy link
Contributor

bradleypeabody commented Aug 20, 2021

This change causes a certain amount of cognitive dissonance

RFC4122 causes it's own cognitive dissonance. The fact that one of the backward compatibility issues that comes up is implementations which check the version bits without checking variant - that's a great example of just strange stuff that should, IMO, never have been there in the first place, and should be deprecated. Unnecessary complexity that people already get wrong because nobody wants to sit down and read the RFC4122 spec, because it's long and unnecessarily complicated. (No offense to the original authors, I'm just saying that today, with the factors at hand, we can do better.)

Some people will choose either just implement UUIDv7+8 or just leave the existing implementations of e.g. v1 and v4 (the most common existing implemented versions) - leave the old code as-is and just write new code for UUIDv7+8. This new code can be simpler and easier to understand. This is a benefit that needs to be measured and compared against the factor of making it different from prior versions.

RFC4122 has many problems. One of the goals here is to "fix" them by introducing a new, simpler design that people can just move forward with and leave the old stuff behind. Once we have new UUIDs, people won't be obligated to continue to have to deal with RFC4122 - if newer versions solve real-world problems, then great, new versions can be implemented and that's it, done deal. I'd much rather focus on making this new draft/spec as simple as possible (while not being unrealistic about backward compatibility issues), than forcing some old stuff from RFC4122 that we don't need.

I will find some examples of implementation differences and post here and hopefully this can help provide a convincing argument of the value of this. But the basic issue I have is I don't think it's simpler to leave things as they were in RFC4122. RFC4122 is a mess, we should move away from it. And if we can do so with only a few manageable backward compatibility concerns, I think it's a workable approach and better in the long run. Again I'll post some code soon to help demonstrate this factor of code complexity.

@bradleypeabody
Copy link
Contributor

bradleypeabody commented Aug 21, 2021

If we merge variant and version fields into one byte and combine it with using a common time format, we get things like this:

https://play.golang.org/p/yWjgCNy_GQq

	var v [16]byte
	binary.BigEndian.PutUint64(v[:8], uint64(time.Now().UnixNano()))
	v[8] = 0xE7
	rand.Read(v[9:])

That is a correct and useful UUIDv7 implementation (per these notes, not the draft). It lacks a guarantee of monotonicity for values produced within the same clock tick (which IMO should be a recommendation not a requirement), but this can be added with just a few more lines of code.

Try finding an earlier UUID implementation that is anywhere near that simple to write and understand.

Simplicity in implementation and maintenance is a very real, tangible factor. It needs to be weighed carefully against the cost of changing things. And if we don't change old UUID version <= 5 values, I really don't see the problem.

Is there some specific real-world problem that this would cause that I'm missing? What specifically (which database program, library or software problem) would happen/break/be difficult/annoying/etc if we were to move forward with this? Maybe if I have an actual example of what you're worried about I could better think with with it.

@fabiolimace
Copy link

fabiolimace commented Aug 21, 2021

I'm still not sure if using half of the future variant is a good idea. I prefer to be conservative in this case. It may be easier to approve the changes.

But I like the simplicity it makes possible. Much of the discussion here arises from the need to work around version bits.

If you really plan to use half of the '111' variant, why would you want to use a version number? Version numbers belong to the '10x' variant. The '111' variant is an uninhabited land. You have the opportunity to create an entirely new layout. There is no need to be stuck with the '10x' variant design, which depends on the version number to differentiate between UUID subtypes.

I think it's better to just define the E variant (enhanced, extended?) and forget about the E7 and E8 versions. Or you can use ONE bit of the E variant as a flag to differentiate between E-UUIDs that have timestamps and those that don't.

I am concerned about reducing the UUID size from 122 to 120 bits. Losing 2 bits can result in a significant increase in collision probability. If you don't use the version number the amount of free bits for entropy increases.

@broofa
Copy link
Contributor

broofa commented Aug 21, 2021

Is there some specific real-world problem that this would cause that I'm missing?

These sorts of questions are harder to answer:

  • "Is [some UUID] valid?"
  • "What version is [some UUID]?"
  • "How do I identify UUIDs in text?"
  • "I have a valid variant (0b111) and a valid version (3), so why is my UUID invalid?"

... and this sort of code becomes more complex:

@bradleypeabody
Copy link
Contributor

Fair points. You're correct, it does add some more logic to these situations.

However, variable length will break the "find a UUID in text" anyway. So will Crockford Base32 encoding.

And extracting the version is an extra line or two of code to fix that code. (The version.js there btw is another example of broken code - it should be checking the variant bits.)

So I think it's a matter of comparing what happens when the points above are broken or made more difficult, vs the fact that all newer implementations (and some of which will only need to support e.g. UUIDv7 and v8) can be simpler.

UUIDs are supposed to be as opaque as possible. I would also wager than much of the code that is trying to extract version numbers and perform validation is probably doing something not terribly relevant to what most applications need anyway. Why are people checking the version? (can't you just use the opaque value) Why are they checking if a UUID is valid? (are you sure you can't just compare to all zeros to determine if there's a UUID here or not?)

@bradleypeabody
Copy link
Contributor

To follow up from earlier discussion and from #58, my current stance on this is that the simplicity combining the variant and version fields introduces is worth the downsides.

So far the down sides that have been brought are, along with by rebuttal:

  1. Added complexity/different from RFC4122

I understand the concern. The procedure for examining the version is explained in the new draft with two sentences:

extracting the version number can be done by examining the variant
field at bits 64 and 65 for the values 1 and 0 respectively, and then
extract the version from bits 48 through 51. UUID versions 7 and 8
can be identified by checking octet 9 for the values 0xE7 or 0xE8
respectively.

Yes, it is different. My opinion is that this does not present too much complexity. The first sentence is just reiterating what RFC4122 says much more verbosely, and the only thing being added is "UUID versions 7 and 8 can be identified by checking octet 9 for the values 0xE7 or 0xE8 respectively."

  1. It reserves two extra bits.

Concerns over the loss of bits being problematic are application specific, and the introduction of variable length UUIDs IMO addresses this concern. A fixed 128-bit value is much more problematic when it comes to concerns about collision probability or unguessability. So I think having those two bits reserved for future use to make one whole byte be devoted to the version is an acceptable tradeoff in the interest of simplicity, considering you can add plenty more bytes to your UUID to further reduce collision resistance if your application really needs it. No need to worry about 2 bits when you can add many more if you like.

If this ends up making it into an RFC, I suspect many new implementations will just implement UUID version 7 and/or 8 and not bother with the rest. IMO, making these implementations simpler should be a priority.

@LiosK
Copy link

LiosK commented Feb 16, 2022

I don't see a real reason to change the variant bits to develop a time-ordered UUID format. Implementing a UUIDv7 generator is an easy job that can be done by just 100 lines of code in many languages, even with the old, weird version/variant layout. The reorganized layout might reduce some lines of code, but I don't think that's worth sacrificing the future extendability of the UUID standard. It's possible in the future that another new UUID format really really needs to move the version bits, and then if no variant is left, the UUID standard will die.

That said, if the last variant should be consumed now, I think the new format should use a different name than version 7. Variant 10x Version 7 may be defined in the future, and such definition should be named as "UUIDv7" to keep consistency with UUIDv1-5. Therefore, Variant 111 Version 7 should be named differently, or Variant 111 series should be started from version 1 with a different naming convention.

@kyzer-davis
Copy link
Contributor Author

@LiosK, with the placement of the variant+version in the same octet we actually extend variant 111 to be used by a future implementation if they desire. I detail this a bit more in the Draft 03 file found PR #58 if you want to take a look at the proposed text.

Long story short:
We set the 3 variant bits to 111 and dictate the next following bit is always a 0. Thus 1110 = E. This is followed by the four bit version in our new variant; but any future spec may specify that if they want to use 111 the next bit should be set to 1 making 1111 (F), and ultimately a new variant is born for whomever to do what they want. I wanted to ensure we did allow for future extensibility of the UUID spec even though there have been no new additions in ~16 years.

As for setting 1110 and starting with version 7 instead of starting over the version counting:
This was the conversation between myself and Brad on the topic back in August of 2021:

Kyzer: There is no reason we need to start at version 7 since our bit space is all to ourselves now with this variant. Basically variant 111 + version 1 and version 2 don’t conflict with RFC4122s version 1 and 2.

Brad: I agree with this in principle, but it creates a new problem of explaining to people what the numbering system is. Just calling it "version 7" and saying "in version 7, byte 8 is set to 0xE7" is really simple to understand and follow. I'm open to a proposal of a different numbering system for this 0b111 variant, but I'm not sold enough on the benefits to originate it myself.

I could go either way after all UUIDv1ε vs UUIDv1 was my thought originally on how to distinguish Variant 1110/E + Version 1 vs RFC 4122 variant 10xx/89AB + Version 1

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 1, 2022

@ben221199

only version 7 and 8 could be used inside this variant

There are still 4 bits for version allowing 0 through F or 0 through 15. We are only allocating 7 and 8 in this spec as such 0 through 6 and 9 through 15 are available for variant 111+0.

Make a variant that describes a new versioning. So, the versions in variant 3 are not the same as the versions in variant 1. (But at least you start counting from v1 again)

As for deciding to go strait to v7 and v8: it was just a logical approach in the document to keep things flowing. See my comment above about distinguishing the two variants:

Copied:

[...] UUIDv1ε vs UUIDv1 was my thought originally on how to distinguish Variant 1110/E + Version 1 vs RFC 4122 variant 10xx/89AB + Version 1


My Suggested edits that will make this a bit more clear:

  • Add the Lower case epsilon (ε) nomenclature to the Draft 03 Variant and Version Fields if it helps drive the point home.
    • Anywhere I reference UUIDv7/UUIDv8/Version 7/Version 8 change to UUIDv7ε/UUIDv8ε/Version 7ε/Version 8ε
  • I can also split Table 2 UUID versions defined by this specification into RFC 4122 Updated version reservations and Draft 03 Version reservations.
    • Table A: Define RFC 4122 variant 10xx/89AB + Version 6 then fill out a table up to v15 with "Reserved for future definition"
    • Table B: Define Variant 1110/E + Version 7ε and 8ε filling out the rest as "Reserved for future definition"

@broofa
Copy link
Contributor

broofa commented Mar 2, 2022

Add the Lower case epsilon (ε) nomenclature to the Draft 03 Variant and Version Fields if it helps drive the point home. Anywhere I reference UUIDv7/UUIDv8/Version 7/Version 8 change to UUIDv7ε/UUIDv8ε/Version 7ε/Version 8ε

This epsilon-ification of version numbers is just weird. I get that it's convenient for us authors/reviewers to use, but for casual readers it's going to be confusing. Exposing this lingo to users is just going to lead to conversations like this:

"Oh, I didn't know there was more than one version? So what are they?"

"Well, there's versions 1-5, and then there's versions 6ε-8ε"

"I'm sorry... epsi-wat?!?"

And while we're on the subject, I think it's worth pointing out how much larger the audience of casual readers of this specification is vs. people actually writing UUID implementations. (Witness the uuid JS module: 3 maintainers .vs. 11M+ dependent projects) That little bit of confusion is felt by 1,000s of people for every 1 person who bothers to write a UUID implmentation.

I will admit that, as an implementor, this new variant is growing on me. However I still don't feel the scope of this work - at least in its current form - warrants it. I'll try not to rehash the "bit swizzling" side of things, though. I think we all know what is / is not involved there.

Regarding @bradleypeabody's argument that "RFC4122 is a mess, and we should move away from it", I have two concerns. The first is that if that's what we're actually doing, we're kind of throwing the baby out with the bathwater. The only portion of 4122 that needs revamping is version 1. Everything else still stands. Versions 4 and 5, in particular, will continue to be relevant.

Which brings me to my other concern: Unless we explicitly state that we're obsoleting 4122 (e.g. the way RFC9501 obsoletes RFC3501), I don't see how we can claim to be "moving away from it". We are, instead, just extending it, as evidenced by our choice of version #'s. Heck, we say as much in the opening paragraph (emphasis mine):

This document is a proposal to update [RFC4122]

Basically if we're adopting a new variant, we need to do a better job removing the need for people to concern themselves with the old one. Specifically...

  • Define 0b111 versions 0-5 as reserved for legacy use (as defined in RFC4122)
  • Specify that versions 2 and 3 are deprecated. (I assume? Version 3 certainly is. Version 2 isn't spec'ed in 4122 even, so... who knows?)
  • Provide complete definitions for versions 4 and 5, such that readers don't need to reference 4122.
  • Insert the requisite XML incantation for obsoleting RFC4122

That 3rd point is the awkward one, and the one that I guess has me struggling to believe we need a new variant. If all we're doing is copy/pasting the text from 4122 - which is probably all that we should be doing if we go this route given we haven't established any need to change either of those versions - .... can we really say we're "moving away from it"?

@ben221199
Copy link

If you want a UUIDv7 and UUIDv8 on the variant 0b1110 (Variant#3), in my opinion you should also define UUIDv1 to UUIDv6 for this variant AND should also define UUIDv7 and UUIDv8 for Variant#2. In this case you use the same "versioning" for both variants. Else, just don't introduce a new variant.

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 2, 2022

I will admit that, as an implementor, this new variant is growing on me.

This is my current stance. In writing prototypes, testing and even writing the document this is much, much easier. Meanwhile, for Draft 03 Brad and I are going to at least keep this present so the IETF can give some weight on the topic. If their feedback is negative then I can roll it back in Draft 04. As such, I need to ensure that Version and Variant section of Draft 03 is as clear as possible. I think my suggested edits in my last comment will go a long way to improving the text.


@broofa, I agree, we cannot obsolete RFC4122. Else you get into the scenario you described where we need to define every one of RFC4122's UUIDs and where they stand now. It is a VERY big undertaking and IMO keeping this as an update to RFC4122 continues to be the best strategy.


Version parity in both variants: @fabiolimace, @ben221199, @broofa
I have noodled on this this over the past few months and my compiled thoughts:

  • For v7/v8:
    • Upsides:
      • No more confusion about what v7/v8 is for.
      • This extends the "do what you want" version to include RFC4122 variant since I know many people already do this and then slap v4 on it.
    • Downsides: We eat up 2 versions in 10xx variant which is now half-used leaving 7 more versions in that bit space.
      • Counterpoint: We were already going to eat up 2 versions in that variant so maybe this is no big deal.
    • Editor Actions: If we do this I need to update v7/v8 sections (and appendix) with ASCII layouts and examples for both.
  • For v1 through v5:
    • Upsides: No future confusion about what version 1-5 is with either variant.
    • Downsides:
      1. 10xx variant is 122 bits while 1110 variant is 120 bits.
      2. We just reserved half of the new variant. Leaving them open for future specs is an effective method of utilizing only what we need and nothing more.
    • Editor Actions: If I reserve them in the new variant I MUST put text about how to translate all of them to 120 bits. Hence my strategy of leaving it as v7/v8 for only and reserving only what we need.
  • Version 0:
    • In writing my comment yesterday this was the first time I realized there was an undefined version 0. Personally I am okay with reserving version 0 in both variants as free form "do what you want with all remaining non-version, non-variant bits" over calling it version 8.
      • Either way we use a version; this is just semantics.
    • Editor Actions:
      • Modify all UUIDv8/Version 8/v8 text to be UUIDv0/Version 0/v0.
      • Add ASCII/Appendix for both variant layouts.

Epsilon usage:

  • This was only brought up as a way to distinguish the variant E# and epsilon being ε char seems to a tidy way to quickly provide an at-a-glance reference in text. If I get the text in version/variant to convey the point properly we should be able to avoid the confusion about what that is used for.

Edit: Epsilon would not play ball with IETF converter tool. Thus Capitol "E" has been replaced instead.


Notes: Editor Actions are only for me to keep track of what I would need to modify in the draft if consensus was achieved.
Reminder: Draft 03 HTML pre-IETF draft can be found here: https://uuid6.github.io/uuid6-ietf-draft/

@ben221199
Copy link

I remember that I read about version 0 somewhere, but I don't know where. I remember something about that it was reserved and only was meant for invalid UUIDs. I also found this: https://github.com/r-lyeh-archived/sole

I see that there is a difference in available bits (120 vs 122) in the new variant. In that case the whole idea of a new variant seems unnecessary. If you still want to introduce a new variant, use the variant for just one format. Don't do any subversioning.

@broofa
Copy link
Contributor

broofa commented Mar 2, 2022

Version 0

Yeah, kind of odd this isn't addressed in the RFC, nor had much discussion here. The only argument against not using it I can see would be to avoid confusion with the Nil UUID. Not a very strong argument, however.

If we switch version 8 to be version 0, I would suggest we have it (version 0) use the 0b10x variant and not the new one. Otherwise explaining which versions correspond to which variants just gets that much more confusing. ("1-5 are 0b10x, versions 0 and 7 are 0b111").

But I think doing that weakens the case for 0b111. We'd be defining a new variant but only using it for a single version (version 7).

@kyzer-davis
Copy link
Contributor Author

@ben221199 interesting you found at least one implementation already using version 0 for custom implementations.

  • Most places I checked reference Version 0 as Nil UUID just like @broofa pointed out but technically that isn't correct.
  • I poked through the RFC4122 Erratas to see if there was anything of value on this topic but there was not.
  • I also searched through the IETF dispatch mailer and only found a single reference from my collegue but I think version 0 was used as a placeholder.

For the moment I will scratch v8 to v0 from the agenda but when I submit draft 03 I will shoot an email to the dispatch mailer seeing if anybody has other usages of v0 that I couldn't' find.

@fabiolimace
Copy link

fabiolimace commented Mar 2, 2022

If the timestamp is 48 bits, I think it is no longer necessary to move the version number to another variant.

|000000000000|7|000|N|000000000000000|

|------------|M|---|N|---------------|
     time      random      random     
               counter                
               submsec                
M: 7
N: [89ab]

The code block below shows a simple function to generate UUIDv7 in Javascript. It uses Math.random() for simplicity. It's not efficient, of course, but it gets the job done.

function hex(number, len) {
    return number.toString(16).padStart(len, '0');
}

function random(bits) {
    const max = Math.pow(2, bits);
    return Math.floor(Math.random() * max);
}

function uuid7() {
    
    let uuid = "";
    
    // get hexadecimal timestamp
    let ms = (new Date()).getTime();
    let timestamp = hex(ms, 12);
    
    // concat timestamp and random
    uuid += timestamp.substring(0, 8);
    uuid += "-";
    uuid += timestamp.substring(8, 12);
    uuid += "-";
    uuid += hex(random(16), 4);
    uuid += "-";
    uuid += hex(random(16), 4);
    uuid += "-";
    uuid += hex(random(48), 12);
    
    // put version and variant
    uuid = uuid.split('');
    uuid[14] = '7';
    uuid[19] = ['8', '9', 'a', 'b'][random(2)];
    uuid = uuid.join('');

    return uuid;
}

function main() {
    let total = 10;
    for (let i = 0; i < total; i++) {
        console.log(uuid7());
    }
};

main();

Output:

017f4c74-bd17-74cb-9959-ae3b35f5f270
017f4c74-bd1b-7971-a9cf-144713872aa1
017f4c74-bd1b-7f3e-b7e2-b57df2796602
017f4c74-bd1b-7179-b79c-dc3a706a2144
017f4c74-bd1b-7303-a248-0cf396e4559e
017f4c74-bd1c-7ea8-a5ef-c3ad91c1d413
017f4c74-bd1c-79dd-ab45-05469dbba543
017f4c74-bd1c-744d-bd2a-d1ffecb6d853
017f4c74-bd1c-716e-b577-62bf993de8fd
017f4c74-bd1c-7e09-8ce8-6d9e3efe57b1

EDIT: updated the function random() to receive a number of bits as argument.

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 3, 2022

@fabiolimace, @ben221199, @broofa @LiosK
I gave the Draft 03 Version and Variant section another coat of paint in the latest PR #75.
I think it is in a much better spot now but let me know what you think, specifically on the new text, over on #66

@broofa
Copy link
Contributor

broofa commented Mar 3, 2022

GoogleCode-golfing @fabiolimace's example using modern JS and BigInt, and including implementations for both forms of variant.

Source on CodePen.

function bigrand(bits, shift = 0n) {
  return BigInt(Math.floor(Math.random() * 2 ** bits)) << shift;
}

function toUUIDString(bignum) {
    const digits = bignum.toString(16).padStart(32, "0");
    return `${
      digits.substring(0, 8)
      }-${digits.substring(8, 12)
      }-${digits.substring(12, 16)
      }-${digits.substring(16, 20)
      }-${digits.substring(20, 32)
    }`;
}

// RFC variant
function uuid7() {
  return toUUIDString(
    (BigInt(Date.now()) << 80n) | // timestamp  
    (0x07n << 76n) | // version
    bigrand(12, 64n) |
    (0x8n << 60n) | // variant
    bigrand(14, 48n) |
    bigrand(48)
  );
}

// New variant
function uuid7e() {
  return toUUIDString(
    (BigInt(Date.now()) << 80n) | // timestamp
    (0x07en << 72n) | // version|variant
    bigrand(36, 36n) |
    bigrand(36)
  );
}

console.log(uuid7());
console.log(uuid7e());

@kyzer-davis
Copy link
Contributor Author

Update:
I am going to shoot for the moon and write up v7/v8 + 8/9/A/B var sections so folks have something to compare against when I submit to IETF. Plus this will make it easier to roll back to old var layout in draft 04 if need be.


Formats

  • UUID Version 6
  • UUID Version 7
  • UUID Version 7E
  • UUID Version 8
  • UUID Version 8E
  • Max UUID

@bradleypeabody
Copy link
Contributor

@kyzer-davis are you planning on putting these in the spec or as an appendix? I worry that regardless of which way this ends up going the spec should clearly propose one way and not both. So maybe write up the UUID 8 and 7 non-E versions as appendices and mention that these are alternates that could be used if the var+ver field idea is shot down. Does that effectively address both of our concerns?

@ben221199
Copy link

I think I should make a complete format-list here:

  • UUID:
    • Variant#0 (0xx) - The legacy UUID by Apollo Computer
      • *
        • 32 bits (time_high), 16 bits (time_low), 16 bits (reserved), 8 bits (family), 56 bits (node)
      • socket_$unspec (0x0)
      • socket_$unix (0x1)
      • socket_$internet (0x2)
      • socket_$implink (0x3)
      • socket_$pup (0x4)
      • socket_$chaos (0x5)
      • socket_$ns (0x6)
      • socket_$nbs (0x7)
      • socket_$ecma (0x8)
      • socket_$datakit (0x9)
      • socket_$ccitt (0xA)
      • socket_$sna (0xB)
      • socket_$unspec2 (0xC)
      • socket_$dds (0xD)
    • Variant#1 (10x)
    • Variant#2 (110)
      • Used in Microsoft DCOM as Interface ID; could not find any format description so far
    • Variant#3 (111)
      • Unused

The family field is 8 bits, but because only values 0 (0b0000) to 13 (0b1101) are used, the first 4 bits can be used to indicate the variant. When the family field starts with a 0-bit, it means everything is still legacy UUID, so that means that values 0 (0b00000000) to 127 (0b01111111) can be used as family, where 0 to 13 are already allocated. When the family field starts with a 1-bit, it is definitely not legacy UUID.


Notice that every variant has its OWN subtyping. Variant#0 does use families. Variant#1 does use "versions". Think good before deciding to open up a new variant. If you open up a new variant, you have 4 options:

  • Use subtyping of another variant.
    • BAD IDEA, because Variant#1 has 1 more bit available than Variant#3. Many "versions" are not compatible with Variant#3.
  • Introduce a new subtyping. Variant#0 has families, Variant#1 has versions, so Variant#3 could have "structures" for example.
    • BETTER IDEA, because you can create new "structures" and also start counting from 0 or 1 (like v1 in Variant#1).
  • Don't use subtyping at all. Your variant will only have one format, it seems that Variant#2 is like that.
    • HMMMM, you introduce a new variant, but uses it for 1 type. Seems a waste.
  • Don't open up a new variant. Just stay with the normal versioning in Variant#1.
    • BEST IDEA, because why do you want to open up a new variant? Variant#1 isn't even halfway full.

@ben221199
Copy link

So what things are now available:

  • Variant#0:
    • Family 14-127; for example for adding IPv6 (socket_$internet6)
  • Variant#1:
    • Version 0
    • Version 6, 7, 8 (but will be used after publication of this spec)
    • Version 9-15
  • Variant#2:
    • Unknown
  • Variant#3:
    • This whole variant is reserved for later use. So everything could be done with it.

@kyzer-davis
Copy link
Contributor Author

Group,

I made some large changes in #85 to introduce both v7/v8 and v7E/v8E as I stated in #26 (comment)

Please give it a review and let's keep discussing that text here.

@sergeyprokhorenko
Copy link

I would suggest filling the version and variant segments with random by default. If in some information system you need to know the version and variant, let them fill in the corresponding value. Although I don't see how this could be useful, besides the necessary mimicry to the old-fashioned UUID standards.

@kyzer-davis kyzer-davis added the Out of Scope Topics not in scope for the RFC/Draft label Mar 16, 2022
@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 16, 2022

Announcement

I had a great discussion with @bradleypeabody and this topic has officially been marked out of scope for Draft 03 (and any future draft.)
The XML text is retained and over the next few weeks I will author a separate Draft 00 which includes this topic specifically.

Edit: To further clarify, Draft 03 will cover UUIDv6 through v8 + Max UUID. The new Draft 00 will cover E Variant, Alternate Encoding and UUID Long. Two drafts that cover different topics so implementations may choose what they want to support. i.e An implementation supports RFC8675309 for v7 but not RFC123456789 for alt encodings.

@ben221199
Copy link

First, I want to say that I like Omni UUID more than Max UUID. Second, I want to say that introducing the Omni UUID will cause introducing a new variant.

@kyzer-davis
Copy link
Contributor Author

@ben221199, I believe we discussed that a bit in #62.
Let's cross post there to keep the convo rolling as it pertains to that thread more than this one (although I do see why it was brought up here.)

@danielmarschall
Copy link

I don't think there should be a new variant.

UUIDs are not only specified by RFC but also standardized in ITU-T Recommendation ITU-T X.667 | ISO/IEC 9834-8.

Rec. ITU-T X.667, clause 11.3 specifies variant 0b111 as reserved for future use.

Therefore, if someone decides that variant 0b111 is used, then IETF, ISO/IEC, and ITU-T should do that step together, otherwise there is a risk that two standardization organizations define it independently and then there would be chaos.

Also, I didn't quite understand, why do you suggest a new variant, although there are enough versions left?

@ben221199

So what things are now available:

  • Variant#0:

    • Family 14-127; for example for adding IPv6 (socket_$internet6)

    [...]

Variant #0 (family 14-127) cannot be used, because the amount of available timestamp bits has been exhausted on 5 September 2015. In re socket_$internet6, since you cannot put 128bit-IPv6 into the 56bit-Node-ID of the NCS variant #0 UUIDs, did you think of a hash or something?

(By the way, @ben221199 , a complete list of the 13 families is super rare, so thank you for that format-list. Do you know where I can find information about the structure of the Node-ID for each of these families? It would be very useful for my opensource UUID decoder tool)

@ben221199
Copy link

I don't think there should be a new variant.

UUIDs are not only specified by RFC but also standardized in ITU-T Recommendation ITU-T X.667 | ISO/IEC 9834-8.

Rec. ITU-T X.667, clause 11.3 specifies variant 0b111 as reserved for future use.

Therefore, if someone decides that variant 0b111 is used, then IETF, ISO/IEC, and ITU-T should do that step together, otherwise there is a risk that two standardization organizations define it independently and then there would be chaos.

Also, I didn't quite understand, why do you suggest a new variant, although there are enough versions left?

Agreed.

Variant #0 (family 14-127) cannot be used, because the amount of available timestamp bits has been exhausted on 5 September 2015. In re socket_$internet6, since you cannot put 128bit-IPv6 into the 56bit-Node-ID of the NCS variant #0 UUIDs, did you think of a hash or something?

Both IPv6 and UUID are 128 bits long. At the time of writing I was possibly thinking of having the IPv6 bits as UUID and only change the family field to indicate that it is a IPv6 UUID, so only 8 bits of data wil be lost. But to make clear, it was only to meant as example on how to add more UUID types. Maybe an IPv4 UUID (socket_$internet) would be a better example, because it fits in 4 bytes.

(By the way, @ben221199 , a complete list of the 13 families is super rare, so thank you for that format-list. Do you know where I can find information about the structure of the Node-ID for each of these families? It would be very useful for my opensource UUID decoder tool)

At the moment I don't know. I have only seen Ethernet MAC addresses and DDS used and only know the MAC format.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Discussion Further information is requested Out of Scope Topics not in scope for the RFC/Draft
Projects
None yet
Development

No branches or pull requests

10 participants