Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Auto Incrementing IDs or UUIDs or Both? #205

Open
dmlb2000 opened this issue Mar 22, 2019 · 8 comments
Open

[Discussion] Auto Incrementing IDs or UUIDs or Both? #205

dmlb2000 opened this issue Mar 22, 2019 · 8 comments

Comments

@dmlb2000
Copy link
Member

I'd like to document discussion on auto-incrementing IDs vs. UUIDs or some hybrid approach to identifying unique instances of objects in a database.

Please think about pros and cons of each approach or suggestions made.

Depending on the outcome of this it will affect #200 and #201. If we drop auto incrementing IDs then one closes and the other gets fixed a specific way.

@dmlb2000
Copy link
Member Author

Using UUIDs as the primary key to an object has some pros and cons...

Pros

  1. They are universal so objects can be exported and shared between instances with potentially different backends
  2. Users can set them without messing up any auto-creation logic for new instances that don't have a UUID yet.

Cons

  1. UUIDs are ugly and can't be easily remembered, recognized or compared by quick visual inspection.
  2. Do not show them to users, they won't understand what it is...
  3. UUIDs are larger than integers on disk.

@dmlb2000
Copy link
Member Author

Using Auto-incrementing IDs as the primary key for objects.

Pros

  1. Fairly easy to look at and remember, recognize and compare visually (as long as they aren't too big).
  2. Indexes are smaller and perform better (proof here?)

Cons

  1. Objects can't be exported and shared between databases due to ID conflicts.
  2. Setting the ID to specific values (when users care) often messes up auto incrementing logic to find a free ID.

@markborkum
Copy link
Contributor

@dmlb2000 UUIDs are encoded with 128 bits. The collision rate for UUIDs is very, very low (c.f., https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions).

It may be feasible for API clients to generate their own UUIDs.

@dmlb2000
Copy link
Member Author

dmlb2000 commented Mar 22, 2019

The outstanding question I'd like to see addressed is there any reason why a hybrid approach would be used? Is the question an exclusive or?

@markborkum
Copy link
Contributor

@dmlb2000 Distributed identity across federated systems (such as databases) is a longstanding technical challenge. The low collision rate of UUIDs makes some aspects less painful.

For me, this is an xor. Auto-incremented, fixed-width integers are great... until you run out of them.

@dmlb2000
Copy link
Member Author

The process to transition from Auto-incrementing IDs to UUIDs would be to add the column. Hopefully, PeeWee will help by generating UUIDs otherwise we'll need to loop over all rows creating them. Lastly, drop the ID field.

For systems that put data into these new tables they were expecting numeric IDs but now are given UUIDs they may not have in their existing systems. We need to have a discussion about what's considered unique about each object and make sure that's documented.

@dmlb2000
Copy link
Member Author

dmlb2000 commented Apr 26, 2019

With recent releases of metadata the Relationships object now has example for how to implement UUID across the objects. We should discuss what objects to apply this to and what objects make more sense to keep as incrementing IDs.

My initial thought is to apply this to everything but NxM tables, Transactions and Files. The NxM tables wouldn't get a UUID to identify the relationship unless nessisary. The foreign key references would be converted.

This is pragmatic since converting those to are a heavier lift, the Ingest would need to be refactored for changing Transactions over.

Converting and worse reformatting would be required to convert Files and affected services would be the entire Pacifica stack.

@dmlb2000
Copy link
Member Author

dmlb2000 commented Jul 1, 2019

I'm wondering if we need to drop the ID field now or just keep it around but make it nullable. At looking deeper into other CMS systems they like to have both. The UUID is the primary key, required and auto-generated. The ID field is a number that is allowed to be NULL, not the primary key, and is unique.

This maybe more general approach and would be suitable for EMSL usage (hopefully).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants