Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UUID v7 support #15

Closed
wants to merge 1 commit into from
Closed

Add UUID v7 support #15

wants to merge 1 commit into from

Conversation

khasinski
Copy link

@khasinski khasinski commented Nov 2, 2022

UUIDv7 (currently in RFC) is a new version of UUID that allows for time ordering values thanks to a unix timestamp component. Can be helpful to iterate over a large set of data (think for example of backfilling migrations in_batches) while still maintaining some of the randomness of UUIDv4.

see https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/04/
There is an updated version of this document: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/

@khasinski khasinski force-pushed the uuid-v7 branch 3 times, most recently from 8e36c6a to 72bc8e6 Compare November 2, 2022 20:48
@khasinski khasinski marked this pull request as ready for review November 2, 2022 21:04
@khasinski khasinski force-pushed the uuid-v7 branch 5 times, most recently from 21ffc3b to def6ad2 Compare November 3, 2022 07:48
UUIDv7, currently RFC is a new version that allows for time
ordering thanks to a unix timestamp component.

@see https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/04/
# See RFC 4122 for details of UUID.
#
def uuid_v7
ts = [Process.clock_gettime(Process::CLOCK_REALTIME, :millisecond)].pack('Q>').unpack('nNn').drop(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC is fundamentally flawed >> and will not work at scale if monotonic total ordering is required <<. CLOCK_REALTIME skips forwards and backwards on many events, just to name a few: hibernation, NTP adjustments, daylight savings time, and leapseconds. And CLOCK_MONOTOMIC_RAW is not suitable for use between systems. If there will only ever be a single system generating UUIDs, then CLOCK_MONOTONIC_RAW fallback on CLOCK_MONOTONIC is appropriate. If multiple systems expect canonical total monotonic ordering, then deploy PTP and use TAI ( CLOCK_TAI on Linux ). CLOCK_REALTIME with a timezone of UTC can never be monotonic due leapseconds. UTC(t) = TAI(t) - leap_seconds_for_year_and_month(t(m, y)) data here. TAI is the primary reliable, global monotonic time standard and essential to providing lock-free, unique, total ordering across multiple systems. The fallback method to global ordering is to have a single (possible SPoF risk) UUID master issuer. TL;DR In any case, this type of UUID won't be useful for anything important.

Copy link
Author

@khasinski khasinski Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment! Is monotonic total ordering required though? From my perspective there are a lot of use cases where a certain instability is accepted while an approximate monotonic ordering will help.

Consider for example a batching mechanism for backfills in a typical RoR application:

Model.in_batches do |batch| # Loads records by 1000 keeping the latest id
   batch.update_all(something: :something)
   # batch operation that would normally lock the table, but it's now locking only selected rows
end

In the above-mentioned example having an UUIDv4 as a primary key means that the records don't have a stable order. The occasional inconsistency of UUIDv7 is usually covered by the batch size.

However I'd be open to rewrite this to use TAI (perhaps as an option) if necessary.

@shreyasbharath
Copy link

Can we merge this?

@pupeno
Copy link

pupeno commented Apr 28, 2023

I'm a fan of using UUID as identifiers, but yeah, sometimes the loss of monotonically increasing is a pain. UUIDv7 would be helpful in many cases. I know technically you can have clock issues, but those clock issues tend to cause problems in the millisecond ranges while most user-generated data tend to be in the seconds or minute ranges for the apps that I build, so it's not a problem. Knowing which record was created first when they were created 2 days apart, just from the id, can be useful.

I think this can be a middle ground before going to a central monotonically increasing generation of ids, ala Twitter Snowflake.

@nevans
Copy link
Contributor

nevans commented Jun 23, 2023

Sorry, I didn't realize that lib/random/formatter.rb belonged to this repository, and I made a very similar PR here: ruby/ruby#7953.

My implementation was originally almost identical to this. But after someone made a comment about monotonicity and I thought about it a little bit, I added an optional part of the draft RFC: a kwarg for 0..12 extra timestamp bits. This changes the timestamp precision from 1ms to up to ~250ns, at the loss of up to 12 bits of randomness, and slightly more complex code.

I agree with @khasinski and @pupeno that perfect monotonicity isn't necessary for most use-cases, and in the places where it is necessary, you probably need to handle it in a centralized DB server anyway (and probably a special purpose database). Considering that my current DBs use v4 UUIDs with zero monotonicity, 1ms precision is certainly good enough for nearly anything I'd use it in. And for simple single-node monotonicity, you can always simply sort, like so: Array.new(1000) { SecureRandom.uuid_v7 }.sort.

IMO, the other techniques provided by the RFC for improving monotonicity are all far too complicated and come with far too many trade-offs. If a ruby application truly needs a monotonicity guarantees better than 240ns of precision (single node, and also whatever clock skew your ntp-managed servers might have), then that application knows what tradeoffs make the most sense and can implement whatever global system state (counters, etc) it needs.

@nevans
Copy link
Contributor

nevans commented Jun 29, 2023

FWIW, I added my slightly different PR here: #19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants