-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify UUIDv8 Hash-based Example #147
Comments
Maybe off-topic, but... It is too late to replace the compound adjective “name-based” with “hash-based” in v3, v5 (and v9); or at least treat them as interchangeable terms in the document? I see both when I inspect some implementations. |
Hm… i think I used both terms, but without intention. I think it is okay if UUIDv9 gets named like UUIDv3 and UUIDv5 (i.e. name-based) |
I still have no opinion on the creation of UUIDv9. But we should be careful with expressions like "fully custom UUIDv8" and "time-based UUIDv8" as if they were side by side in the hierarchy. There should be only one UUIDv8, which is "Fully Custom UUID" (if I'm correct). The time-based example is just one instance of a possible implementation of UUIDv8. I think the confusion remains because UUIDv8 was originally a time-based UUID with custom timestamp precision. There is a lot of outdated webpages that still say that UUIDv8 is, in essence, a time-based version. However, all UUIDv8 requirements have been eliminated except the variant and version bits, making it fully custom/free form/proprietary format. Please, see the original discussion about UUIDv8: uuid6/uuid6-ietf-draft#31. |
Uhm. It's really about six months too late for this suggestion. |
That is probably a tolerable change at this point. |
Actually, I do feel terrible for making a proposal at this point. But maybe there is a slight chance that it is possible. I really wish I had known about the UUID-Rev group 6+ months ago, then I would have contributed much more :-) In theory (if the proposal is accepted in substantially), the editing steps are rather small:
If my proposal could be formally and substantially accepted, then I would like to help as good as I can by reading the whole draft from top to bottom to verify all references. |
I don't agree on this one yet. |
Then, get all the formal reviews that have occurred to redo their work. That's what we've been doing for the past six months. I am still not in anyway convinced I understand why this matters. |
If UUIDv8 is FULLY custom, we shouldn't give implementation examples in my opinion. So, I think that the time-based version has to go. |
No, I think examples are always good.
I agree with that. I have expressed my concerns earlier (#127) that I think it is weird that the only use of the non-example-section "hash spaces" is in an example-section - and now, even an IANA registry is planned (#144). Why should IANA register hash spaces, when they are only used in an "example" of our draft? So I agree with @ben221199 that there should be UUIDv9 in case we want the hashspace thing to be recognized by IANA. An Exception would be if the idea of hash spaces (i.e. UUID identifying an algorithm that is not already defined by an OID) might be useful in future standards.
I think one of the main reasons is the question "Why should IANA (or anyone else) care about hash spaces when they are only used in an "example" of our draft?" Also, don't you think that modern hash algorithms aren't important enough to get their own version? It is a strong signal for people to stop using UUIDv3 and UUIDv5.
I have no idea what you are talking about. UUIDv9 behaves the same as UUIDv3 and UUIDv5, just with different algorithm and a hash space.
I don't understand. Which list are you referring to? But in any case, I want to say that it is very important for me that the custom name-based (i.e. non-MD5/SHA1) stay, especially since both MD5 and SHA1 are insecure. Please do not remove the name-based example or the hash-space appendix. They are an amazing idea, even if they use the wrong UUID version in my opinion :-) |
Examples are, but for me it now seems more like a version definition than an example. I think we should be more clear about that. Don't understand the test vectors of UUIDv8 either. For all other parts, I think that @danielmarschall and I are mostly on the same line. |
I'd point out one thing that seems to be missed in the recent discussion around the new name-based scheme: the uniqueness of hashspace-based UUIDs is absolutely dependent on the hash function used. The current draft allows arbitrary hash functions chosen by implementers, and thus the uniqueness is dependent on the application-specific choice of hash functions. This application-specific nature makes the hashspace approach a very good example of v8. In my opinion, it isn't really useful or meaningful to spare a separate version number without eliminating this application-specific nature, and to achieve that, we have to do at least the following work:
Perhaps, it's considerably late to complete these. In addition, the above bullet points pose a subtle question relating to #143 and #144: what hash functions should we officially list in the specification? The current hashspace approach accepts whatever hash function (because it's v8), but it doesn't make sense to list and promote unsafe hash functions in the specification. I don't know how IANA works, but if it doesn't have a mechanism to reject nonsensical hash functions from being listed, IANA will not be a good choice. The OID-based hashspace IDs would make noise because it allows any hash function to get an official-ish hashspace ID without selection. |
I am not sure if I understood what you mean. Yes, if different hash methods are used, the resulting UUIDv9 is different. However, you need to think about the hash algorithm be an input of the hash function.
As I mentioned above, if we stay at UUIDv8, then we do not know "which" UUIDv8 was chosen. Was it a fully custom, a custom time format, or a custom hash format? We don't know. For UUIDv9, if the hash is unambiguous, then the UUIDv9 is unambiguous.
If a hash algorithm is "unsafe" is dependant of the time. We know a few hash functions which are unsafe today. But we don't know if SHA2 or SHA3 might become unsafe tomorrow, if someone finds a flaw. So safe/unsafe should be out of scope for the RFC. However, the requirement of the hash function should be that it has at least 122 bits output (or it needs to be zero-padded). I think the selection of hash algorithms in the current draft is good. It contains the NIST algorithms which are VERY well-known and are currently (2023) safe to use. If other hash algorithms emerge in the future and/or SHA2 and SHA3 become very insecure, then there can be a revision of the RFC with a different Appendix B. But this is not mandatory because the mechanism of hash spaces and/or the IANA registry of hashes allows the developer to simply choose a different algorithm. |
Yes, and we don't care at all. This hash UUID was not the intent of this RFC. And now this hash UUID is delaying the approval of everyone's expected UUIDv7. The uniqueness of the hash UUID is questionable not only because of problems with satisfactory hash functions, but also because the hash functions argument is not unique. Worse, the variability of the hash functions argument leads to variability in the hash UUID, but a volatile key is unsuitable for databases and other information systems. And hash UUIDs are also unordered, so they are no better than UUIDv4. We should throw hash UUID into the UUIDv8 category as soon as possible and stop delaying final RFC approval. |
Who is "we"?
And why did it end up in the latest draft then?
Then you also need to strike UUIDv4. |
Then you don't use them for databases. I would like for a standardized mechanism for non-SHA-1 hash UUIDs for sure. I use UUIDs as device identifiers in the operating system I'm working on, and standard device names get fixed IDs assigned by using namespace UUIDS. Using something other than SHA-1 or md5 would be nice. |
The current hashspace method accepts an arbitrary hash function, so I can define one as follows, give it a hashspace ID, and generate UUIDv8 name-based UUIDs. func MiracleHash(message []byte) []byte {
digest := make([]byte, 16, 16) // prepare zero-filled byte sequence
time.Sleep(42 * time.Second) // wait for a miracle to compute digest
return digest
} This However, out of the v8 space, in my opinion, the spec must reject unsafe (in terms of uniqueness) hash functions. UUID isn't an almighty identifier framework but is just a universally unique identifier standard. All the versions provided in the document (except for v8, which is clearly marked as implementation-specific) must provide a reasonable guarantee of universal uniqueness, because that is exactly what general readers of the standard are looking for. This is my personal opinion, but I believe this way of thinking maximizes the utility of the UUID standard. The hashspace approach as in the current draft should be good enough as an informative guidance, but there are several questions we have to answer to make it a normative definition that gives a reasonable uniqueness guarantee. For example:
These questions are kept on hold when the current hashspace approach was first introduced because it was expected to be just an informative example. Thinking twice, we might conclude this approach is not the best option to deal with the above questions. Perhaps, we need to reconsider the other ideas of name-based schemes we have explored before. Anyway, developing a normative name-based scheme isn't that easy, and perhaps we have run out of time. Btw, I don't really care the delay caused. I am thankful to @danielmarschall for raising this discussion. |
If you say UUIDv8 is fully custom, is not good to give an example which is the only example where hashspaces are used and then register those hashspaces at IANA. Then the only right to exist of hashspaces are based on an example; and then you are actually not talking about a example anymore, but about a new format. UUIDv8 is fully custom. You are able to give implementation examples, but know that giving examples can give the suggestion that it are real formats that need to implemented, which should not be the case. So caution is advised. My advise options for time-based UUIDv8 example:
My advise options for name-based UUIDv8 example:
|
The goals of this RFC are well stated in 2.1. Update Motivation |
I'm fine with UUIDv6 and UUIDv7 as far as I know. So if we fix or drop UUIDv8, the RFC could be published in my opinion. However, I think many also want UUIDv8 in the same RFC, but not with the chaotic definition it has now. |
@LiosK In re your |
I agree that the UUIDv8 examples should be removed completely as they can be taken as a guide to action. |
Removing UUIDv8 examples (without introducing UUIDv9) would mean that we strike the complete SHA2/SHA3 functionality and we are left with MD5 and SHA1. That would be horrible. |
You can simply list depricated technologies and accepted technologies for UUIDv8 without examples. You can also specify a list of UUIDv8 categories (time-based, hash-based and so on) |
https://mailarchive.ietf.org/arch/msg/uuidrev/dhxgO66xkpNBrOtSy0AY8nV9bAE/ @sergeyprokhorenko @danielmarschall @ben221199 @LiosK @chorman0773 If this matters, then please participate. |
@danielmarschall, Please clarify your proposal taking into account the discussion that took place. It would be terrible if we had to re-do the entire approval process of this long-awaited RFC because of frankly completely useless details regarding almost unused hash UUIDs. @mcr, I completely agree with @LiosK's point of view and give him my vote |
Okay. I will carefully read through the discussion and adjust the initial post of this GitHub issue. Maybe also re-phrasing some parts for better understanding.
Is it true that the approval process needs to be done again, or is this change something that just needs a Re-review of the ADs (i.e. they only re-review the changes, not everything?)
It depends on the use-cases. I understand that UUIDv7 is very important for databases because of their order. But that doesn't mean that the other UUID versions are useless for everyone. I have worked on a lot of projects where hash based UUIDv3 and UUIDv5 were required. Having SHA2/3 or Any-Hash would be an important improvement for the UUIDv3 and UUIDv5 use-cases. |
I suggest to freeze the draft RFC and stop making any changes to it because it's too late and improvements could go on forever. There was a lot of time to make the discussed amendments in the previous stages of RFC development. Stakeholders will be able to propose changes to an already approved RFC |
That's an extreme. The same logic holds for the hash-based v8 too and will ultimately remove all the v8 examples from the document. Examples are helpful for readers as they concisely convey our intention to introduce v8, which can't be expressed in the succinct normative description. Examples won't confuse readers as long as they are present clearly labeled as implementation examples. |
In this case the examples of UUIDv8 (only!) must be accompanied by a disclaimer that the implementer can use these examples at his/her own risk, but the examples themselves are not recommended by the standard, have not been properly tested or examined, and their use may lead to errors in the information system. |
FYI, in line with #150 thinking, I was planning to move the v8 "test vectors" to a new appendix titled "illustrative examples". @bradleypeabody, @bradleypeabody
Timeline: Let me finish out some of these other early draft-12 tracker items for easier things. Then I will branch this off of the new draft 12 base. ETA Monday/next week as I let those other items in PR #152 bake. |
If it is not meant to be implemented, can it then be used as a reference for interoperability? (Well, technically, it can, if it gets its own appendix/section number that someone can refer to)
(Personal opinion) I am fine with either SHA-X or xxHash; I don't have a preference. SHA-X is a bit more well-known and the truncation of extra bits could be illustrated in the example. xxHash on the other hand is very fast and might be good for UUIDs.
Since it would be great to have it as a reference for interoperability, it would be good if it could be defined for hash algorithms < 128 bits. The developer needs to decide if they want to use that algorithm, though.
Sounds good! |
Agreed. Examples don't need to contain information for interoperability. @mcr, I cannot make the interim, unfortunately. |
FTR, I do not consider a completely custom v8 useful for my identified use case, regardless of the examples provided for it. |
No wonder. Huge efforts have been put into making UUIDv7 perfect. Therefore, there are no more useful ideas left for UUIDv8. The only purpose of UUIDv8 is not to limit the imagination of implementers. And it would be strange to advise them on this using examples. |
@chorman0773 I don't follow. If you direct implementations to use UUIDv8 following the hash example provided, it should have approximately the same uniqueness probabilities as any other UUID (while also being hash-based). What aspect of this makes the resulting UUID not/less unique? |
I have to dive into how these meetings work. Don't know if I have time to participate, because I also have other work to do. |
FYI, I have changed the title of the issue so it reflects the current state of this tracker item. |
The point of using UUIDs is that I don't need to provide any guidance to driver implementors beyond the UUID RFC except that they should not use the namespace |
The draft is clear that the UUIDv8's uniqueness MUST NOT be assumed, so in your case, if you do not provide guidance to driver implementers, then you must reject a v8 value from registering. You are also free to provide driver implementers with detailed v8 guidance specifying the structure and hash functions, just like you would provide namespace guidance for v3/v5. That's the "implementation-specific" exactly means. I see your case might want (though not require) a standardized hash-based ID definition, but it seems considerably difficult to achieve consensus for such a scheme, and in my opinion it needs a separate RFC project. |
Well, strictly speaking, unless the device id is already in use or it's one of the two sentinel values (full is explicitly reserved, nil is "Don't care, kernel assign device id", the ID isn't going to be rejected, either for kernel mode drivers or user mode drivers. It's up to the individual drivers if they. The question is whether the kernel itself would use the IDs, not whether they'd be available for anything else to use. |
@chorman0773 Since, from what I gather, you control at least the recommendation of what you're suggesting drivers do when generating a UUID, it seems like just providing a specific suggestion of what kind of UUID you recommend would work.
Why? What prevents you from simply telling people who make drivers that the recommendation is to use UUID v3, 5 or v8 following the hash example, and they do something else there is a very small chance that the probability of collision increases.
Keep in mind that there's only so much uniquness you're gong to get in a 128 bit value. There physically no guarantees with any of the UUID versions that fully absolutely prevent collisions, all you can do is reduce the probability. And if we assume that whatever other weird stuff people do with UUIDv8 is going to be roughly evenly distributed, it's easy to make the argument that your collision probabily doesn't increase. Anyway, this UUID v8 example looks like what we'll realistically be able to get into the RFC. So if you want to help shape that, great, if not, that's fine too. ALSO: Keep in mind that it takes various implementations and experience in order to end up with a new standard. If enough people get behind it, then this v8 example could end up being v9 in a later update to the document, another reason to put effort into this v8 example - it's a starting point. |
The `GUID.v8()` method is no longer supported due to recent sudden changes in the UUIDv8 discussions. It will be removed when the new RFC is finally published. See the latest discussions about UUIDv8: * ietf-wg-uuidrev/rfc4122bis#143 * ietf-wg-uuidrev/rfc4122bis#144 * ietf-wg-uuidrev/rfc4122bis#147
I have greatly reduced the complexity of the UUIDv8 Name/Hash based example in a4f5693 To summarize this commit:
All-in-All this checks the box of:
|
@kyzer-davis Thank you very much for your work! I think it is very good to have this appendix with examples, and that the UUIDv8 are simplified by removing the hash-space. I have a few thoughts about https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/hash-based-uuids/draft-ietf-uuidrev-rfc4122bis.md :
I am not sure... doesn't this conflict with the idea of a reference for interoperability (which was discussed earlier)? In other words: If one decides to implement SHA-256 UUIDv8 according to RFC xxx Appendix xxx, will they do a "bad job" if they implement something that is "not meant to implemented"?
I think it should mean hash algorithms and not hash protocols.
The horrible example with "one nanosecond in 48 bits" is still there... Can't we assign more bits, or anything so that it wraps at least in each 35 years instead of just a few hours? |
For the time-based example, here are alternatives (I am keeping 1 nano second): A. B. C. D. My preferrence would be >=100 years, so 64 bit. |
In terms of hash-based items on #147 (comment)
Time comment: #147 (comment)
|
@kyzer-davis Thank you! Looks good to me. |
(Continuation of a discussion in #144)
I want to propose that part of UUIDv8 gets split in a new format UUIDv9. I hope there is a slight chance that it is still possible at this stage.
Current situation: UUIDv8 has three functions:
Proposal:
Why do I handle custom time-based and custom name-based differently?
With Section C.8 (name-based example) and Appendix B (hash spaces, #143), we have a pretty clear definition of how such a name-based UUID is calculated. We know the OIDs of a lot of algorithms and have a mechanism for how to convert the OID in hash space ID (#143). Even if we don't know the hash space ID, then IANA will probably have it in their registry (#144). The calculation of the UUID is defined very well, and therefore, if the hash space ID is unambiguous, then the UUID is unambiguous.
In opposite to custom time-based (which is very custom, because the vendor can define the length of time-part, clock sequence, random part, etc.), the new name-based version does not allow changing any contents/fields.
So, I do not think that these new name-based UUIDs should be considered '"custom". They are not "custom" in my opinion. Therefore I think they should have their own format, UUIDv9.
Since custom time-based and fully custom UUID are very custom, I don't think it would be an issue if they both share the same version (UUIDv8). After all, people who create custom UUIDs (either by defining a custom time-format, or a fully custom UUID) know that their custom definition is not standardized and might cause collissions.
By the way (personal opinion): I really dislike the time-based example (in section C.7), because the nanoseconds-resolution is rather extreme, causing a wrap-around of the time very quickly. I would have lowered the resolution to make the wrap-around take ~100 years. But on the other hand, the text "It should be noted that this example is just to illustrate one scenario for UUIDv8." is clear that this is not a fixed definition.
The text was updated successfully, but these errors were encountered: