Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UUID version 7 #19

Merged
merged 1 commit into from
Sep 19, 2023
Merged

Add support for UUID version 7 #19

merged 1 commit into from
Sep 19, 2023

Conversation

nevans
Copy link
Contributor

@nevans nevans commented Jun 29, 2023

Although the specification for UUIDv7 is still in draft, the UUIDv7 algorithm has been relatively stable as it progresses to completion.

Version 7 UUIDs can be very useful, because they are lexographically sortable, which can improve e.g: database index locality. See section 6.10 of the draft specification for further explanation:

https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/

The specification allows up to 12 bits of extra timestamp precision, to make UUID generation closer to monotonically increasing. This provides between 1ms and ~240ns of timestamp precision. At the cost of some code complexity and a small performance penalty, a kwarg may specify any arbitrary precision between 0 and 12 extra bits. Any stronger guarantees of monotonicity have considerably larger tradeoffs, so nothing more is implemented. This limitation is documented.

Ruby issue: https://bugs.ruby-lang.org/issues/19735

@nevans nevans mentioned this pull request Jun 29, 2023
@nevans
Copy link
Contributor Author

nevans commented Jun 29, 2023

FYI: see an earlier PR here: #15. I wasn't aware of that other PR when I first started on mine. The primary distinction from that PR is that this allows up to 12 extra bits of timestamp precision, at the cost of code complexity.

lib/random/formatter.rb Outdated Show resolved Hide resolved
# {Section 6.2}[https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-07.html#monotonicity_counters]
# of the specification.
#
def uuid_v7(extra_timestamp_bits: 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions for a better keyword parameter name?

@nevans nevans force-pushed the uuid_v7 branch 3 times, most recently from 9972161 to 63d18fd Compare July 3, 2023 14:03
rand.unpack("H4H12").join("-")
]

when (0..12) # the generic version is slower than the special cases above
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be clearer as when (1..11) ?

Copy link
Contributor Author

@nevans nevans Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. :) I can go either way:

  • On the one hand, because 0 and 12 are handled by the prior clauses, it makes sense to only show the numbers that will be handled by this clause.
  • On the other hand, since this is the generic version, it can handle 0..12, and it's nice to "document" that here.

Although I did consider this earlier, I think I left it as 0..12 mostly by accident: While I was testing and benchmarking the code, I would copy/paste the entire method and then comment out or delete one of the other clauses. So it was temporarily simpler to keep this as 0..12.

So, what do you think?

@e12e
Copy link

e12e commented Jul 24, 2023

Any updates on this or sibling for including UUIDv7 support?

Copy link
Member

@unak unak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

#
# See draft-ietf-uuidrev-rfc4122bis[https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/]
# for details of UUIDv7.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevans It would be good to note that fixing the random number seed (e.g., by Kernel#srand) does not make this method reproducible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add a note for that. If I remember correctly, I had originally written a version that allowed a keyword argument for the timestamp. But when I changed the code to allow extra_timestamp_bits, it didn't seem like it was worth the complexity.

Copy link
Contributor Author

@nevans nevans Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the following:

Suggested change
#
#
# Note that this method cannot be made reproducable with Kernel#srand, which
# can only affect the random bits. The sorted bits will still be based on
# Process.clock_gettime.
#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevans Ah, there are some other ways to fix the random seed than Kernel#srand, such as: Random.new(0).uuid_v7.

How about this? This PR is already merged, so I will create another PR if you are OK.

Note that this method cannot be made reproducable because its output includes not only random bits but also timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mame Looks good. I like your text better than mine. It's simpler, clearer, and covers the Random.new(0) scenario too. :)

@hsbt
Copy link
Member

hsbt commented Sep 15, 2023

@nevans Thank you for submitting this.

Can you confirm #19 (comment)? I'll merge this after that.

@nevans
Copy link
Contributor Author

nevans commented Sep 16, 2023

@nevans Thank you for submitting this.

Can you confirm #19 (comment)? I'll merge this after that.

I pushed a new version with the change I suggested here: #19 (comment). Is that good?

Thanks, @mame and @hsbt. 🙂

lib/random/formatter.rb Outdated Show resolved Hide resolved
Although the specification for UUIDv7 is still in draft, the UUIDv7
algorithm has been relatively stable as it progresses to completion.

Version 7 UUIDs can be very useful, because they are lexographically
sortable, which can improve e.g: database index locality.  See section
6.10 of the draft specification for further explanation:

  https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/

The specification allows up to 12 bits of extra timestamp precision, to
make UUID generation closer to monotonically increasing.  This provides
between 1ms and ~240ns of timestamp precision.  At the cost of some code
complexity and a small performance penalty, a kwarg may specify any
arbitrary precision between 0 and 12 extra bits.  Any stronger
guarantees of monotonicity have considerably larger tradeoffs, so
nothing more is implemented.  This limitation is documented.

Ruby issue: https://bugs.ruby-lang.org/issues/19735
@nevans
Copy link
Contributor Author

nevans commented Sep 18, 2023

@hsbt I addressed @rhenium's comment. And I hadn't noticed that the build was failing for ruby 2.6 (the tests were using Time.at ts, in: "Z"), so I just pushed an update that fixes that.

@hsbt hsbt merged commit 61cb29e into ruby:master Sep 19, 2023
12 checks passed
@nevans nevans deleted the uuid_v7 branch September 19, 2023 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants