Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation is unclear for Guid primary keys #34260

Closed
jscarle opened this issue Jul 21, 2024 · 3 comments
Closed

Documentation is unclear for Guid primary keys #34260

jscarle opened this issue Jul 21, 2024 · 3 comments
Assignees
Labels
area-sqlserver closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@jscarle
Copy link

jscarle commented Jul 21, 2024

I am currently in the midst of trying to repair the data architecture of an application which fell victim to the original designer's poor decision to make the primary keys of each of the tables Guids stored as char(36) strings... sigh... 😔

Concidently, with the arrival of Guid.CreateVersion7, this has renewed discussions regarding the use of Guid in databases and surfaced the issues faced SQL Server's endianess conflicts when dealing with Guids. Fragmentation of clustered indexes when using Guids has been going on for years now.

I know that similar discussions have come up within this repository and I know that the documentation has been improved in the past, but honestly I'm now more confused than ever.

In the documentation, Explicitly configuring value generation, there is a mention that "on SQL Server, when a GUID property is configured as a primary key, the provider automatically performs value generation client-side, using an algorithm to generate optimal sequential GUID values", but it doesn't mention whether the generation is lexicographical, nor if it is stored lexicographically within the database by taking into account the endianness difference between Guids in .NET and uniqueidentifiers in SQL Server.

In SQL Server's documentation for NEWSEQUENTIALID, there is a mention of how it can be used to "generate GUIDs to reduce page splits and random IO at the leaf level of indexes".

The EF Core documentation for the SQL Server provider for Guids also mentions that "the provider automatically generates optimal sequential values, similar to SQL Server's NEWSEQUENTIALID function", but again without any mention of its impacts on clustered index fragmentation.

There's also the privacy concern that sequential uniqueidentifier values brings forward.

On the surface, it seems like Version 7 of the Guid would solve a lot of these issues, but with the lack of clarity regarding the way a Guid translate to the way it's finally stored in the database, and what EF Core does when storing it, makes me unsure on which direction to go with this.

I could write my own value converter that calls TryWriteBytes(destination, bigEndian: true) when storing and new Guid(source, bigEndian: true) when reading with a binary(16) instead of a uniqueidentifier, but I'd rather avoid going through all of the trouble to reimplement something that may already be done internally.

So any clarification would be greatly appreciated, and thanks for all your hard work @roji and @ajcvickers!

@roji
Copy link
Member

roji commented Jul 21, 2024

Concidently, with the arrival of dotnet/runtime#103658, this has renewed discussions regarding the use of dotnet/runtime#86084 and surfaced the issues faced dotnet/runtime#86798 when dealing with Guids. Fragmentation of clustered indexes when using Guids has been going on for years now.

I don't think anything in dotnet/runtime#86084 is relevant to this discussion - that's purely about endianness in the binary serialization APIs around Guid. Getting the endianness right when serializaing Guid is an internal concern of the low-level database driver (SqlClient in this case), which is also why I think the impact on the end user in that conversation is greatly exaggerated (i.e. users generally shouldn't need to deal with binary serialization of GUIDs in database contexts, unless the GUID is stored as a binary field). In any case, all this has nothing really to do with index fragmentation.

The EF Core documentation for the SQL Server provider for Guids also mentions that "the provider automatically generates optimal sequential values, similar to SQL Server's NEWSEQUENTIALID function", but again without any mention of its impacts on clustered index fragmentation.

The SequentialGuidValueGenerator that's used by default with the EF SQL Server provider largely mimics the SQL Server NEWSEQUENTIALID function, as is clearly documented; the point there is precisely to reduce index fragmentation. A link to the NEWSEQUENTIALID page in the SQL Server docs is provided right there, so users can go deeper and understand exactly what that means. What additional documentation do you think is missing in the EF docs, beyond us replicating the entire SQL Server NEWSEQUENTIALID docs inside the EF docs, which I don't believe we should do?

If you're looking for guidance in the EF docs on how to generate your own binary representation of efficient GUID values for SQL Server, then I don't think that belongs in our docs. That's quite an anti-pattern, and doing this correctly has nothing to do with EF.

On the surface, it seems like Version 7 of the Guid would solve a lot of these issues, but with the lack of clarity regarding the way a Guid translate to the way it's finally stored in the database, and what EF Core does when storing it, makes me unsure on which direction to go with this.

That's most likely not the case: i haven't personally verified this, but SQL Server has its own sort order which UUIDv7 wouldn't be appropriate for (see #33579 (comment)).

I could write my own value converter that calls TryWriteBytes(destination, bigEndian: true) when storing and new Guid(source, bigEndian: true) when reading with a binary(16) instead of a uniqueidentifier, but I'd rather avoid going through all of the trouble to reimplement something that may already be done internally.

I still am not clear on whether you're representing your GUIDs as binary or string in the database; at the top of your post you mention char(36), but below you speak of binary(16).

@Timovzl
Copy link

Timovzl commented Jul 22, 2024

@jscarle A .NET Guid has the same ordering as its .ToString() counterpart. With .NET 9's UUIDv7 being monotonic in Guid form, that means it is monotonic in string form too. As such, the char(36) column should achieve the intended performance benefits from UUIDv7.

The endianness only comes in to play when serializing to binary, which you don't seem to be doing.

@roji
Copy link
Member

roji commented Aug 9, 2024

Closing this as no answer was provided to my last comment, and I'm not seeing anything actionable on our side. @jscarle if you'd like to continue the conversation, please post back here and we can revisit.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2024
@roji roji added the closed-no-further-action The issue is closed and no further action is planned. label Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-sqlserver closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

4 participants