-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider our default GUID generation strategy #30753
Comments
Keep in mind that one of the main benefits of client-side generation is that the key value is known immediately after the entity is tracked, rather than only later after it has been inserted into the database. Some applications use this for tracking entities throughout their lifecycle, so changing this by default would be quite a significant breaking change. |
Let's discuss, I'd like to understand this better.. |
Here's one example where having client-generated value known immediately after Add/Attach is in fact a disadvantage: #13575 (comment) |
I wrote the report, and I just want to add that IMO, I think having GUID keys be immediately available is a great feature. However, I think that there are probably a lot of people who are using sequential GUIDs that may not even know that EF core is generating IDs on the client side unless they're adding |
The issue you're experiencing is that even though a type-1 UUID is based on a timestamp, SQL Server will first sort by the nodeid. So when you have multiple clients each generating their own UUIDs client side, they will have different nodeIDs. This then causes SQL Server to essentially sort rows from one client together, and rows from another client together. When in reality they should be sorted together based on the 100ns resolution timetsamp; and in the unlikely event that two users generated a UUID at the exact same 100ns tick, then the tie is broken in the usual way (with the collision counter, and then the nodeid). Entity Framework currently swaps bytes around to make it more sortable based on the big-endian that SQL Server uses. Which is good. But you have to take it a step further, acknowledge that SQL Server sorts by the "NodeID" bytes first, and move the timestamp to there. Here's a C# class we use to generate SQL Server "sortable" UUIDs:
|
How does Postgres sort uuids? would it also benefit from this work? |
@onionhammer not quite. PG can indeed benefit from switching to UUIDv7, and this is something I hope we'll get to do for 9.0. But that's not the same as the SQL Server's "sequential" GUID - see npgsql/efcore.pg#2909). |
Splitting out @JackTrapper's suggestion above to #33579, whereas this issue can continue tracking switching to server-side generation (which would be an inferior solution). |
Would this actually be a fix for index fragmentation? The SQL Server docs for NEWSEQUENTIALID even states that it might not return sequential values after a Windows restart. |
Those are great questions, but we don't have the answers to them - they need looking into. |
We currently generate GUIDs on the client by default; on SQL Server, we use SequentialGuidValueGenerator, which works similarly to the SQL Server NEWSEQUENTIALID. Sequential GUIDs reduce index fragmentation (important for good performance), and generating them on the client reduces network roundtrips by allowing principal with GUID keys to be inserted in the same batch with dependents pointing to them.
However, there's a pretty reliable-looking report that in parallel environments, our cilent-generated GUIDs do lead to high index fragmentation due to various jitter factors, causing values not to be inserted in order (see previous perf investigation but which was serial only). We should try reproducing this; if this is the case, we should consider switching to server-generated GUIDs by default (with
NEWSEQUENTIALID
) - the importance of good, non-fragmented indexes is likely to far overweigh the extra roundtrips at insertion with dependents.The text was updated successfully, but these errors were encountered: