-
Notifications
You must be signed in to change notification settings - Fork 9
Discussion: Redefine variant bit (111) definition #26
Comments
Could you clarify for me what does "variable subsecond encoding up to nanoseconds using floating point math and fractions" and "variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds" mean or what is the difference between them? |
@nerg4l, the topic being discussed in #24 if we kept the current UUIDv7 and just changed it to UUIDv6 (since UUIDv6 becomes UUIDv1ε). Then we can also add an alternative encoding that uses the 30-bit variant without any floating point math and binary fraction encoding as UUIDv6ε. We get the best of both words. UUIDv7 becomes what was UUIDv8 and I drop UUIDv8 from the draft. I edited the table to add more clarity. |
@kyzer-davis I was thinking, for simplicity, we would only define a meaning for the variants + versions that we actually want. Meaning basically that UUIDv6 I think stays as it is with the old variant, and UUIDv7 and 8 use the new 111 variant. Otherwise I think we have too much variation without any real benefit. |
Thanks for the clarification. I don't think it makes sense to create |
I looked more into this and found two things which should be taken into consideration. I'm not sure if it is relevant in case of extending the RFC but ITU also has a UUID definition in X.667. Which states the following:
I also checked how variant#0 (NCS) UUID looked like. It seems, it does not have a version bit. https://opensource.apple.com/source/CF/CF-299.35/Base.subproj/uuid.c.auto.html
Unfortunately, I could not find anything specification about Microsofts' variant#2. A lot of people refer to UUDs defined by RFC 4122 as variant#1 version x UUIDs. RFC4122 also states the following about variant and version:
There for, I assume using variant#3 would allow to redefine the the structure entirely. Moving or removing version from the definition for example. Probably, current implementations of RFC4122 should be ignored because keeping BC looks impossible. |
It is the last unused variant, so there is no room for error. |
I also think the version bits (subtype) are specific to the RFC4122 variant (type), which has many subtypes that must be separated from each other. Variant 3 (111) doesn't even have a structure yet. This file appears to be the basis for Apple's implementation: https://github.com/BeyondTrust/pbis-open/blob/master/dcerpc/uuid/uuid.c |
Using the I believe a more correct approach would be to use a different version to distinguish between timestamp encodings, much the way v3 and v5 distinguish between namespace hash algorithms. And, yes, this would effectively double the number of new UUID versions being proposed, which is not ideal. This is one reason I'd like to see this proposal culled back to just 1 new timestamp version, per #30. |
What about using |
You mean the I don't see such a need at this time. |
Agree. |
I tried to lay it out here https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md as best I could. But the idea is if the variant field is set to 0b111, this would mean the version field fits in the bottom (least significant) bits of the 9th byte (so var and ver are in this one byte). This technically loses us only 1 bit - since variant was 3 bits, and version was 4 but now we using 8. I agree that we should not try to make new variants of v1, v4, etc. or v6 for that matter (since its goal is to be easily adaptable from and as close to v1 as possible). But I think we can use it in v7 and v8 to simplify the bit layout so there's just one byte you have to worry about when determining version info for v7 and v8:
I'm hoping this can help move things toward greater simplicity in the spec/proposal. |
This is not "greater simplicity". And the rift this creates in how versioning works is going to be an ongoing pain in the ass to articulate and rationalize about.
Now that I think about it, that last part about 0b111 versions 1-5, is actually pretty awful. That those particular variant-version combinations are possible just seems ripe for confusion and abuse. |
This is clearly overkill. Version 1 is the only version where there is any value in rearranging bit layout. |
The logic I'm proposing in https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md would be simply: For UUID7 And for UUID versions <= 6 do what RFC4122 indicates. That's it. It is a difference, but I don't think it's overly complicated and confusing.
I agree, we should just disallow other uses of the 0b111 variant, since I don't think there's any real benefit to allowing it. |
I understand what you're proposing. And I agree that, had this been the bit layout RFC4122 used from the beginning, it would be simpler than what we have now. But it's not. It's a new layout in addition to what the original RFC specifies, so the complexity it introduces is in addition to the complexity that's already there.
I think you're missing the point. It's not about what exactly we say around this issue (although we'll have to make some decisions there), it's that we have to say anything at all. This change causes a certain amount of cognitive dissonance that can be avoided altogether if we just stick with the current scheme. |
RFC4122 causes it's own cognitive dissonance. The fact that one of the backward compatibility issues that comes up is implementations which check the version bits without checking variant - that's a great example of just strange stuff that should, IMO, never have been there in the first place, and should be deprecated. Unnecessary complexity that people already get wrong because nobody wants to sit down and read the RFC4122 spec, because it's long and unnecessarily complicated. (No offense to the original authors, I'm just saying that today, with the factors at hand, we can do better.) Some people will choose either just implement UUIDv7+8 or just leave the existing implementations of e.g. v1 and v4 (the most common existing implemented versions) - leave the old code as-is and just write new code for UUIDv7+8. This new code can be simpler and easier to understand. This is a benefit that needs to be measured and compared against the factor of making it different from prior versions. RFC4122 has many problems. One of the goals here is to "fix" them by introducing a new, simpler design that people can just move forward with and leave the old stuff behind. Once we have new UUIDs, people won't be obligated to continue to have to deal with RFC4122 - if newer versions solve real-world problems, then great, new versions can be implemented and that's it, done deal. I'd much rather focus on making this new draft/spec as simple as possible (while not being unrealistic about backward compatibility issues), than forcing some old stuff from RFC4122 that we don't need. I will find some examples of implementation differences and post here and hopefully this can help provide a convincing argument of the value of this. But the basic issue I have is I don't think it's simpler to leave things as they were in RFC4122. RFC4122 is a mess, we should move away from it. And if we can do so with only a few manageable backward compatibility concerns, I think it's a workable approach and better in the long run. Again I'll post some code soon to help demonstrate this factor of code complexity. |
If we merge variant and version fields into one byte and combine it with using a common time format, we get things like this: https://play.golang.org/p/yWjgCNy_GQq var v [16]byte
binary.BigEndian.PutUint64(v[:8], uint64(time.Now().UnixNano()))
v[8] = 0xE7
rand.Read(v[9:]) That is a correct and useful UUIDv7 implementation (per these notes, not the draft). It lacks a guarantee of monotonicity for values produced within the same clock tick (which IMO should be a recommendation not a requirement), but this can be added with just a few more lines of code. Try finding an earlier UUID implementation that is anywhere near that simple to write and understand. Simplicity in implementation and maintenance is a very real, tangible factor. It needs to be weighed carefully against the cost of changing things. And if we don't change old UUID version <= 5 values, I really don't see the problem. Is there some specific real-world problem that this would cause that I'm missing? What specifically (which database program, library or software problem) would happen/break/be difficult/annoying/etc if we were to move forward with this? Maybe if I have an actual example of what you're worried about I could better think with with it. |
I'm still not sure if using half of the future variant is a good idea. I prefer to be conservative in this case. It may be easier to approve the changes. But I like the simplicity it makes possible. Much of the discussion here arises from the need to work around version bits. If you really plan to use half of the '111' variant, why would you want to use a version number? Version numbers belong to the '10x' variant. The '111' variant is an uninhabited land. You have the opportunity to create an entirely new layout. There is no need to be stuck with the '10x' variant design, which depends on the version number to differentiate between UUID subtypes. I think it's better to just define the E variant (enhanced, extended?) and forget about the E7 and E8 versions. Or you can use ONE bit of the E variant as a flag to differentiate between E-UUIDs that have timestamps and those that don't. I am concerned about reducing the UUID size from 122 to 120 bits. Losing 2 bits can result in a significant increase in collision probability. If you don't use the version number the amount of free bits for entropy increases. |
These sorts of questions are harder to answer:
... and this sort of code becomes more complex: |
Fair points. You're correct, it does add some more logic to these situations. However, variable length will break the "find a UUID in text" anyway. So will Crockford Base32 encoding. And extracting the version is an extra line or two of code to fix that code. (The version.js there btw is another example of broken code - it should be checking the variant bits.) So I think it's a matter of comparing what happens when the points above are broken or made more difficult, vs the fact that all newer implementations (and some of which will only need to support e.g. UUIDv7 and v8) can be simpler. UUIDs are supposed to be as opaque as possible. I would also wager than much of the code that is trying to extract version numbers and perform validation is probably doing something not terribly relevant to what most applications need anyway. Why are people checking the version? (can't you just use the opaque value) Why are they checking if a UUID is valid? (are you sure you can't just compare to all zeros to determine if there's a UUID here or not?) |
To follow up from earlier discussion and from #58, my current stance on this is that the simplicity combining the variant and version fields introduces is worth the downsides. So far the down sides that have been brought are, along with by rebuttal:
I understand the concern. The procedure for examining the version is explained in the new draft with two sentences:
Yes, it is different. My opinion is that this does not present too much complexity. The first sentence is just reiterating what RFC4122 says much more verbosely, and the only thing being added is "UUID versions 7 and 8 can be identified by checking octet 9 for the values 0xE7 or 0xE8 respectively."
Concerns over the loss of bits being problematic are application specific, and the introduction of variable length UUIDs IMO addresses this concern. A fixed 128-bit value is much more problematic when it comes to concerns about collision probability or unguessability. So I think having those two bits reserved for future use to make one whole byte be devoted to the version is an acceptable tradeoff in the interest of simplicity, considering you can add plenty more bytes to your UUID to further reduce collision resistance if your application really needs it. No need to worry about 2 bits when you can add many more if you like. If this ends up making it into an RFC, I suspect many new implementations will just implement UUID version 7 and/or 8 and not bother with the rest. IMO, making these implementations simpler should be a priority. |
I don't see a real reason to change the variant bits to develop a time-ordered UUID format. Implementing a UUIDv7 generator is an easy job that can be done by just 100 lines of code in many languages, even with the old, weird version/variant layout. The reorganized layout might reduce some lines of code, but I don't think that's worth sacrificing the future extendability of the UUID standard. It's possible in the future that another new UUID format really really needs to move the version bits, and then if no variant is left, the UUID standard will die. That said, if the last variant should be consumed now, I think the new format should use a different name than version 7. Variant 10x Version 7 may be defined in the future, and such definition should be named as "UUIDv7" to keep consistency with UUIDv1-5. Therefore, Variant 111 Version 7 should be named differently, or Variant 111 series should be started from version 1 with a different naming convention. |
@LiosK, with the placement of the Long story short: As for setting
I could go either way after all |
There are still 4 bits for version allowing 0 through F or 0 through 15. We are only allocating 7 and 8 in this spec as such 0 through 6 and 9 through 15 are available for variant 111+0.
As for deciding to go strait to v7 and v8: it was just a logical approach in the document to keep things flowing. See my comment above about distinguishing the two variants: Copied:
My Suggested edits that will make this a bit more clear:
|
This epsilon-ification of version numbers is just weird. I get that it's convenient for us authors/reviewers to use, but for casual readers it's going to be confusing. Exposing this lingo to users is just going to lead to conversations like this:
And while we're on the subject, I think it's worth pointing out how much larger the audience of casual readers of this specification is vs. people actually writing UUID implementations. (Witness the I will admit that, as an implementor, this new variant is growing on me. However I still don't feel the scope of this work - at least in its current form - warrants it. I'll try not to rehash the "bit swizzling" side of things, though. I think we all know what is / is not involved there. Regarding @bradleypeabody's argument that "RFC4122 is a mess, and we should move away from it", I have two concerns. The first is that if that's what we're actually doing, we're kind of throwing the baby out with the bathwater. The only portion of 4122 that needs revamping is version 1. Everything else still stands. Versions 4 and 5, in particular, will continue to be relevant. Which brings me to my other concern: Unless we explicitly state that we're obsoleting 4122 (e.g. the way RFC9501 obsoletes RFC3501), I don't see how we can claim to be "moving away from it". We are, instead, just extending it, as evidenced by our choice of version #'s. Heck, we say as much in the opening paragraph (emphasis mine):
Basically if we're adopting a new variant, we need to do a better job removing the need for people to concern themselves with the old one. Specifically...
That 3rd point is the awkward one, and the one that I guess has me struggling to believe we need a new variant. If all we're doing is copy/pasting the text from 4122 - which is probably all that we should be doing if we go this route given we haven't established any need to change either of those versions - .... can we really say we're "moving away from it"? |
If you want a UUIDv7 and UUIDv8 on the variant |
This is my current stance. In writing prototypes, testing and even writing the document this is much, much easier. Meanwhile, for Draft 03 Brad and I are going to at least keep this present so the IETF can give some weight on the topic. If their feedback is negative then I can roll it back in Draft 04. As such, I need to ensure that Version and Variant section of Draft 03 is as clear as possible. I think my suggested edits in my last comment will go a long way to improving the text. @broofa, I agree, we cannot obsolete RFC4122. Else you get into the scenario you described where we need to define every one of RFC4122's UUIDs and where they stand now. It is a VERY big undertaking and IMO keeping this as an update to RFC4122 continues to be the best strategy. Version parity in both variants: @fabiolimace, @ben221199, @broofa
Epsilon usage:
Edit: Epsilon would not play ball with IETF converter tool. Thus Capitol "E" has been replaced instead. Notes: Editor Actions are only for me to keep track of what I would need to modify in the draft if consensus was achieved. |
I remember that I read about version 0 somewhere, but I don't know where. I remember something about that it was reserved and only was meant for invalid UUIDs. I also found this: https://github.com/r-lyeh-archived/sole I see that there is a difference in available bits (120 vs 122) in the new variant. In that case the whole idea of a new variant seems unnecessary. If you still want to introduce a new variant, use the variant for just one format. Don't do any subversioning. |
Yeah, kind of odd this isn't addressed in the RFC, nor had much discussion here. The only argument against not using it I can see would be to avoid confusion with the Nil UUID. Not a very strong argument, however. If we switch version 8 to be version 0, I would suggest we have it (version 0) use the 0b10x variant and not the new one. Otherwise explaining which versions correspond to which variants just gets that much more confusing. ("1-5 are 0b10x, versions 0 and 7 are 0b111"). But I think doing that weakens the case for 0b111. We'd be defining a new variant but only using it for a single version (version 7). |
@ben221199 interesting you found at least one implementation already using version 0 for custom implementations.
For the moment I will scratch v8 to v0 from the agenda but when I submit draft 03 I will shoot an email to the dispatch mailer seeing if anybody has other usages of v0 that I couldn't' find. |
If the timestamp is 48 bits, I think it is no longer necessary to move the version number to another variant.
The code block below shows a simple function to generate UUIDv7 in Javascript. It uses function hex(number, len) {
return number.toString(16).padStart(len, '0');
}
function random(bits) {
const max = Math.pow(2, bits);
return Math.floor(Math.random() * max);
}
function uuid7() {
let uuid = "";
// get hexadecimal timestamp
let ms = (new Date()).getTime();
let timestamp = hex(ms, 12);
// concat timestamp and random
uuid += timestamp.substring(0, 8);
uuid += "-";
uuid += timestamp.substring(8, 12);
uuid += "-";
uuid += hex(random(16), 4);
uuid += "-";
uuid += hex(random(16), 4);
uuid += "-";
uuid += hex(random(48), 12);
// put version and variant
uuid = uuid.split('');
uuid[14] = '7';
uuid[19] = ['8', '9', 'a', 'b'][random(2)];
uuid = uuid.join('');
return uuid;
}
function main() {
let total = 10;
for (let i = 0; i < total; i++) {
console.log(uuid7());
}
};
main(); Output:
EDIT: updated the function |
@fabiolimace, @ben221199, @broofa @LiosK |
function bigrand(bits, shift = 0n) {
return BigInt(Math.floor(Math.random() * 2 ** bits)) << shift;
}
function toUUIDString(bignum) {
const digits = bignum.toString(16).padStart(32, "0");
return `${
digits.substring(0, 8)
}-${digits.substring(8, 12)
}-${digits.substring(12, 16)
}-${digits.substring(16, 20)
}-${digits.substring(20, 32)
}`;
}
// RFC variant
function uuid7() {
return toUUIDString(
(BigInt(Date.now()) << 80n) | // timestamp
(0x07n << 76n) | // version
bigrand(12, 64n) |
(0x8n << 60n) | // variant
bigrand(14, 48n) |
bigrand(48)
);
}
// New variant
function uuid7e() {
return toUUIDString(
(BigInt(Date.now()) << 80n) | // timestamp
(0x07en << 72n) | // version|variant
bigrand(36, 36n) |
bigrand(36)
);
}
console.log(uuid7());
console.log(uuid7e()); |
Update: Formats
|
@kyzer-davis are you planning on putting these in the spec or as an appendix? I worry that regardless of which way this ends up going the spec should clearly propose one way and not both. So maybe write up the UUID 8 and 7 non-E versions as appendices and mention that these are alternates that could be used if the var+ver field idea is shot down. Does that effectively address both of our concerns? |
I think I should make a complete format-list here:
The Notice that every variant has its OWN subtyping. Variant#0 does use families. Variant#1 does use "versions". Think good before deciding to open up a new variant. If you open up a new variant, you have 4 options:
|
So what things are now available:
|
Group, I made some large changes in #85 to introduce both v7/v8 and v7E/v8E as I stated in #26 (comment) Please give it a review and let's keep discussing that text here. |
I would suggest filling the version and variant segments with random by default. If in some information system you need to know the version and variant, let them fill in the corresponding value. Although I don't see how this could be useful, besides the necessary mimicry to the old-fashioned UUID standards. |
AnnouncementI had a great discussion with @bradleypeabody and this topic has officially been marked Edit: To further clarify, Draft 03 will cover UUIDv6 through v8 + Max UUID. The new Draft 00 will cover E Variant, Alternate Encoding and UUID Long. Two drafts that cover different topics so implementations may choose what they want to support. i.e An implementation supports RFC8675309 for v7 but not RFC123456789 for alt encodings. |
First, I want to say that I like |
@ben221199, I believe we discussed that a bit in #62. |
I don't think there should be a new variant. UUIDs are not only specified by RFC but also standardized in ITU-T Recommendation ITU-T X.667 | ISO/IEC 9834-8. Rec. ITU-T X.667, clause 11.3 specifies variant 0b111 as reserved for future use. Therefore, if someone decides that variant 0b111 is used, then IETF, ISO/IEC, and ITU-T should do that step together, otherwise there is a risk that two standardization organizations define it independently and then there would be chaos. Also, I didn't quite understand, why do you suggest a new variant, although there are enough versions left?
Variant #0 (family 14-127) cannot be used, because the amount of available timestamp bits has been exhausted on 5 September 2015. In re (By the way, @ben221199 , a complete list of the 13 families is super rare, so thank you for that format-list. Do you know where I can find information about the structure of the Node-ID for each of these families? It would be very useful for my opensource UUID decoder tool) |
Agreed.
Both IPv6 and UUID are 128 bits long. At the time of writing I was possibly thinking of having the IPv6 bits as UUID and only change the
At the moment I don't know. I have only seen Ethernet MAC addresses and DDS used and only know the MAC format. |
Question:
Should we redefine UUID variant bits 111 (E/F) which are currently "Reserved for future definition." as per the original RFC 4122 Section 4.1.1 source?
Proposal:
Possible text changes depending on feedback from this issue and #24
As such the Draft 01 goes from the current three definitions:
To the possible four definitions in Draft 02:
The text was updated successfully, but these errors were encountered: