-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: SHA256 UUID Generation #50
Comments
+1 |
The problem I have using v8 for this is some of the use cases that were discussed for SHA1 and MD5: the desire for two nodes with the same object to be hashed to arrive at the same UUID. While it's possible to do this with v8 by prior arrangement, there must be a reason to have v3 and v5 specify the hash algorithm they use. One reason I can think of is if there is a future transition away from SHA256 to something else and it's important during a transition to know what UUIDs are generated with SHA256 and which with the newer algorithm. |
I was wondering if we can/should pick some new vX, allocate 4 -bits to hash type, and put SHA256 there. |
@mcr, @jimfenton: I thought about this a bit over the weekend and here are my thoughts summarized nicely. Deprecate SHA1 and replace it with SHA256 as the "new" v5Pro: Avoids new version, removes security considerations around SHA1. Cram SHA256 into v5Pro: Avoids new version, removes security considerations around SHA1. Leverage SHA256 with v8Pro: Leaves SHA1 UUIDs and implementations to operate how they have for the past 20 years. Allocate v9 for SHA256Pro: Removes security considerations around SHA1 (can discourage v5 use and point v3 and v5 at v9). No ambiguity about usage. |
Kyzer Davis ***@***.***> wrote:
### Allocate v9 for SHA256
Allocate v9 for all-future hashes, with a hash-subtype.
|
@mcr That's also the direction I was thinking, although I don't have any first-hand knowledge on how these UUID versions are used. |
@mcr @jimfenton I came here to say the same thing. Perhaps some set of bits can be allocated as a hash ID. We would need to create a registry of hash IDs, and depending on the number of bits allocated for the hash ID, we'd be limiting the number of future hashes that could be added to the registry. In practice, this might never become an issue. |
One possible approach I'd suggest: Predefine hash algorithm UUIDs and prepend one to name space ID and nameimport hashlib
import uuid
# predefined by RFC 4122
NAMESPACE_DNS = uuid.UUID("6ba7b810-9dad-11d1-80b4-00c04fd430c8")
# predefined by new RFC (UUIDv4s I just made up for example)
ALGORITHM_SHA256 = uuid.UUID("3fb32780-953c-4464-9cfd-e85dbbe9843d")
ALGORITHM_SHA512 = uuid.UUID("e6800581-f333-484b-8778-601ff2b58da8")
# concatenate hash_algorithm_uuid + namespace_uuid + name; then hash
sha256 = hashlib.new("sha256")
sha256.update(ALGORITHM_SHA256.bytes)
sha256.update(NAMESPACE_DNS.bytes)
sha256.update(b"example.com.")
print("SHA-256:", sha256.hexdigest()[0:32], "(truncated)")
# ditto
sha512 = hashlib.new("sha512")
sha512.update(ALGORITHM_SHA512.bytes)
sha512.update(NAMESPACE_DNS.bytes)
sha512.update(b"example.com.")
print("SHA-512:", sha512.hexdigest()[0:32], "(truncated)")
# output:
# SHA-256: 564315c658dc181edb907cfa7d55605b (truncated)
# SHA-512: b096e1610da091aa73cdfe4d1132a5aa (truncated) This approach ensures that the inputs to hash algorithms are unique per algorithm no matter what the namespace and name are, though such consideration could be meaningless for collision resistance because different hash algorithms are expected to produce very different sequences of bytes. This approach doesn't allow us to identify the hash algorithm used by a given name-based UUID, but I'm not convinced that such reverse engineering is really necessary. It is anyway not possible to reproduce a name-based UUID without knowledge of the namespace ID and the original name, and with such knowledge, it is quite easy to determine the hash algorithm by trying all the few common algorithms. For the same reason, I am skeptical about allocating dedicated hash ID bits in the precious 128-bit space, especially when we have a different idea to guarantee uniqueness. |
What @LiosK works for me. I just think that we will get pushback if we don't provide a way to use newer hashes in a deterministic way. I'm unclear what version would be used for the above approach. |
Although I don't have a strong opinion here, I believe a new name-based scheme is not worth a new version and should stay in the v8 because I'd totally agree with the points quoted by Kyzer:
A standard makes sense only when it coordinates multiple implementations to interoperate with each other. It might logically look flawed to define deprecated algorithm-based v3 and v5 only, but, if there are few people wishing for an updated name-based scheme, it wouldn't be really helpful to introduce a new standard just to fix the flaw. Plus, v3 and v5 used to be the only mechanisms that could incorporate application-specific ID (name) schemes into the UUID space, but now we have v8 and can include whatever application-specific information in a UUID. I'd anticipate fewer use cases of v3 and v5 after the introduction of v8. I'm also concerned about the inactive discussion so far over name-based schemes. I'm not even sure if truncating hash digests is a safe, secure, and valid approach to produce a universally unique identifier. Anyway, we can perhaps add the discussion here to the best practice section and see if new name-based practices emerge. |
I've often wondered this about v5. Introducing even longer hashes probably stands a greater chance of seeing repeating characters at the beginning of the hash, especially with the truncation involved to fit within 128 bits, though I'm no expert on hashing algorithms. |
I don't object to v3/v5: they were in rfc4122, and were current at the time. |
Just caught up on the thread. Let me take a pass tomorrow at implementing some text around what @LiosK discussed so we can add some "best practice" logic to future hash based UUIDs in v8 bit space. Also, @ramsey, totally agree. SRTP went with an approach back in the day of truncating the SHA to 32 or 80 length for the early SRTP Crypto Suites and they updated that when they added new algos: https://www.rfc-editor.org/rfc/rfc7714#section-13.2 Perhaps when I get back to |
By the way, it's interesting that FIPS 180-4 (SHA-1 and SHA-2 standard) explicitly permits to take leftmost bits of a message digest:
I'm not sure if the same discussion applies to SHA-3 as well. Plus, FIPS 180-4 is currently under review for revision, and some public comments expressed concern about truncation. So, this section of FIPS 180-4 might not survive in FIPS 180-5, but probably we can find some useful discussion about digest truncation around FIPS 180-4 resources. Edit: Perhaps, we should also take a look at SP 800-90A to deep dive into name-based schemes. If we can derive 122-bit random (statistically independent and unbiased) data from a name in a deterministic manner, then we can construct a UUID from the random bits. SP 800-90A discusses deterministic random bit generators (DRBGs) and SHA-1/2 functions as building blocks of DRBGs. |
My preference: Cite that NIST SP 800-90A document as another resource for using random in applications properly in our later sections. On the FIPS180-4 comment: We could cite at least FIPS 180-4 as something that allows truncation at least in those versions. If they don't want to truncate guide towards v8 (which also covers us if they disallow truncating down the road.) |
Oh, I didn't mean the RFC should reference SP 800-90A. I was just like saying: if we were to develop a new name-based scheme seriously, we would have to prove that the new scheme guaranteed the universal uniqueness of the outputs, and the NIST doc would be helpful for that. At a glance, SP 800-90A seems to rely on truncated hash digests as a source of random (i.e., statistically independent and unbiased) sequences of bits. If so (I mean, if each SHA function produces leading 122 bits in a statistically independent and unbiased manner), the approach I suggested previously will produce universally unique IDs even if it truncates digests and mixes the results from multiple SHA functions in one 128-bit ID space. |
I pushed up 83d95da This contains:
Testing:
|
Looks awesome! Could sound obvious, but I think we should add some clarifying text to the Some Hash Space IDs section saying like when using SHAKE_128 or SHAKE_256, implementations must extract at least 128 bits for a digest, because these variable length algorithms may technically produce a digest shorter than 128 bits. |
Understood, I was not aware they are variable rate! Edit, updated as per b094dfc |
* ElementUtil (constructNameUUID): Use UUIDDigest
Question from @jimfenton on Interim:
Draft 02 Proposed Text:
TestingThe first 128 bits that we are about are always the same and do not change even if you request an output of more bits. SHAKE-128https://emn178.github.io/online-tools/shake_128.html
SHAKE-256https://emn178.github.io/online-tools/shake_256.html
Source:https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
|
- Describe Nil/Max UUID in variant table #16 - Further Clarify that non-descript node IDs are the preferred method in distributed UUID Generation #49 - Appendix B, consistent naming #55 - Remove duplicate ABNF from IANA considerations #56 - Monotonic Error Checking missing newline #57 - More Security Considerations Randomness #26 - SHA265 UUID Generation #50 - Expand multiplexed fields within v1 and v6 bit definitions # 43 - Clean up text in UUIDs that Do Not Identify the Host #61 - Revise UUID Generator States section #47 - Expand upon why unix epoch rollover is not a problem #44 - Delete Sample Code Appendix #62
Topic from Interim:
Restrictions:
Ideas
UUIDv9 (SHA256 Based UUID)
v5-Testing.md?plain=1#L132
Add Text and steer towards v8 as a use case
The text was updated successfully, but these errors were encountered: