Reconsider our default GUID generation strategy #30753

roji · 2023-04-24T17:54:05Z

We currently generate GUIDs on the client by default; on SQL Server, we use SequentialGuidValueGenerator, which works similarly to the SQL Server NEWSEQUENTIALID. Sequential GUIDs reduce index fragmentation (important for good performance), and generating them on the client reduces network roundtrips by allowing principal with GUID keys to be inserted in the same batch with dependents pointing to them.

However, there's a pretty reliable-looking report that in parallel environments, our cilent-generated GUIDs do lead to high index fragmentation due to various jitter factors, causing values not to be inserted in order (see previous perf investigation but which was serial only). We should try reproducing this; if this is the case, we should consider switching to server-generated GUIDs by default (with NEWSEQUENTIALID) - the importance of good, non-fragmented indexes is likely to far overweigh the extra roundtrips at insertion with dependents.

The text was updated successfully, but these errors were encountered:

ajcvickers · 2023-04-25T19:56:56Z

Keep in mind that one of the main benefits of client-side generation is that the key value is known immediately after the entity is tracked, rather than only later after it has been inserted into the database. Some applications use this for tracking entities throughout their lifecycle, so changing this by default would be quite a significant breaking change.

roji · 2023-04-25T20:20:21Z

Let's discuss, I'd like to understand this better..

roji · 2023-09-11T08:25:52Z

Here's one example where having client-generated value known immediately after Add/Attach is in fact a disadvantage: #13575 (comment)

ConnerPhillis · 2023-09-11T18:20:37Z

Keep in mind that one of the main benefits of client-side generation is that the key value is known immediately after the entity is tracked, rather than only later after it has been inserted into the database.

I wrote the report, and I just want to add that IMO, I think having GUID keys be immediately available is a great feature. However, I think that there are probably a lot of people who are using sequential GUIDs that may not even know that EF core is generating IDs on the client side unless they're adding HasDefaultValueSql to their code.

#13592

JackTrapper · 2024-03-29T16:07:20Z

The issue you're experiencing is that even though a type-1 UUID is based on a timestamp, SQL Server will first sort by the nodeid. So when you have multiple clients each generating their own UUIDs client side, they will have different nodeIDs. This then causes SQL Server to essentially sort rows from one client together, and rows from another client together.

When in reality they should be sorted together based on the 100ns resolution timetsamp; and in the unlikely event that two users generated a UUID at the exact same 100ns tick, then the tie is broken in the usual way (with the collision counter, and then the nodeid).

Entity Framework currently swaps bytes around to make it more sortable based on the big-endian that SQL Server uses. Which is good.

But you have to take it a step further, acknowledge that SQL Server sorts by the "NodeID" bytes first, and move the timestamp to there.

Here's a C# class we use to generate SQL Server "sortable" UUIDs:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Web;

/// <summary>
/// Summary description for SortableGuid
/// </summary>
public class SortableGuid
{
	//UUID timestamps are 100ns ticks starting from October 15, 1582 (date we switched to Gregorian calendar)
	//Windows FILETIME is 100ns ticks starting from January 1, 1601  (the date of the start of the first 400-year cycle of the Gregorian calendar)
	private static readonly Int64 guidEpochOffset = -5748192000000000; //6653 days --> 159672 hours --> 9580320 minutes --> 574819200 seconds -> 574819200000 ms -> 574819200000000 us --> 5748192000000000 ticks

	private static Int64 _lastTick = DateTime.UtcNow.Ticks + guidEpochOffset;

	/// <summary>
	/// 
	/// </summary>
	/// <param name="location">The destination, whose value is compared with comparand and possibly replaced.</param>
	/// <param name="comparison">The value that is compared to the value at location1.</param>
	/// <param name="newValue">The value that replaces the destination value if the comparison results in equality.</param>
	/// <returns>true if comparand was greater than location, and location was updated to newValue. 
	/// Otherwise false.</returns>
	public static Boolean InterlockedExchangeIfGreaterThan(ref Int64 location, Int64 newValue, Int64 comparand)
	{
		//Thank you Raymond Chen: https://stackoverflow.com/a/13056904/12597
		Int64 currentValue;
		do
		{
			currentValue = Interlocked.Read(ref location); //a read of a 64-bit location is not atomic on x86. If location was 32-bit you could get away with just "currentValue = location;"
			if (currentValue >= comparand) return false;
		}
		while (System.Threading.Interlocked.CompareExchange(ref location, newValue, currentValue) != currentValue);
		return true;
	}

	/// <summary>
	/// Returns a new sortable guid. These guid's are suitable as clustering keys in SQL Server.
	/// </summary>
	/// <returns></returns>
	public static Guid NewGuid()
	{
		/*
			Blatently borrowed from Entity Framework.
				https://github.com/dotnet/efcore/blob/master/src/EFCore/ValueGeneration/SequentialGuidValueGenerator.cs


			Except with two differences:
				- they start with an initial timetime, generated statically once - and keep increasing that. 
					That causes the "timestamp" to drift further and further from reality. 
					We generate a timestamp each time, and only rely on incrementing if we're not greater than the last timestamp. 
					A CPU is capable of generating ~200k GUIDs per 100ns tick - so it's not like we can ignore the duplciate ticks problem.
				- UUID timestamp ticks start at October 15, 1582 (date of gregorian calendar). 
					Windows, and DateTime.Ticks returns number of ticks since January 1, 1601 (the date of the first 400 year Gregorian cycle). 
					We'll offset the timestamp to match the UUID spec.
				- We place the version number type-7: Sortable by SQL Server with a timestamp.

			SQL Server sorts first by the NodeID (i.e. the final 6-bytes):

			16-byte guid                                            Microsoft clsid string representation
			===========--=====--=====--=====--=================     ======================================
			33 22 11 00  55 44  77 66  99 88  ff ee dd cc bb aa ==> {00112233-4455-6677-9988-ffeeddccbbaa}
			\_______________________/  \___/  \_______________/
			Timestamp and Version  ^    Clk        Node ID

			The goal is to create a sequential uuid (e.g. UuidCreateSequential), but without having to rely on a MAC address. 
			(i.e. Does an Azure web-site even **have** a MAC address? 
			We certainly can't P/Invoke out to UuidCreateSequental when we're not running on Windows)

			So we conceptually move the 8-byte timestamp to it's new location in NodeID+ClockSequence

			And what used to be Timestamp+Version becomes random.

			And, like type-4 Uuids being called Type-4 to help reduce the chances of collisions between types,
			we call this new version type-7 (sortable by SQL Server with a timestamp).
		*/

		//Start from a new random (type 4) uuid. 
		Byte[] guidBytes = Guid.NewGuid().ToByteArray();

		//Generate 8-byte timestamp
		Int64 currentTicks = DateTime.UtcNow.Ticks + guidEpochOffset;
		//Make sure that currentTicks is greater than the last tick count (in case they're generating guids faster than 100ns apart)
		Int64 last;
		do
		{
			last = Interlocked.Read(ref _lastTick); //a read of a 64-bit value isn't atomic; so it has to be interlocked.
			if (currentTicks <= last)
				currentTicks = last + 1;
		} while (Interlocked.CompareExchange(ref _lastTick, currentTicks, last) != last);


		Byte[] counterBytes = BitConverter.GetBytes(currentTicks);

		if (!BitConverter.IsLittleEndian)
		{
			Array.Reverse(counterBytes);
		}

		//This Guid started it's life as a Type 4 (Random) guid, but the final 8 bytes were changed to contain a sequential counter.
		//I want to tag this as a different kind of Uuid algorithm. In Delphi we already called this a Type 7 uuid.
		guidBytes[07] = (Byte)(0x70 | (guidBytes[07] & 0x0f));

		//Put the timestamp in place - in the proper SQL Server sorting order.
		guidBytes[08] = counterBytes[1];
		guidBytes[09] = counterBytes[0];
		guidBytes[10] = counterBytes[7];
		guidBytes[11] = counterBytes[6];
		guidBytes[12] = counterBytes[5];
		guidBytes[13] = counterBytes[4];
		guidBytes[14] = counterBytes[3];
		guidBytes[15] = counterBytes[2];

		Guid result = new Guid(guidBytes);
		return result;
	}
}

onionhammer · 2024-04-19T19:38:20Z

How does Postgres sort uuids? would it also benefit from this work?

roji · 2024-04-20T07:26:07Z

@onionhammer not quite. PG can indeed benefit from switching to UUIDv7, and this is something I hope we'll get to do for 9.0. But that's not the same as the SQL Server's "sequential" GUID - see npgsql/efcore.pg#2909).

roji · 2024-04-20T07:33:16Z

Splitting out @JackTrapper's suggestion above to #33579, whereas this issue can continue tracking switching to server-side generation (which would be an inferior solution).

cremor · 2024-07-12T16:07:40Z

we should consider switching to server-generated GUIDs by default (with NEWSEQUENTIALID) - the importance of good, non-fragmented indexes is likely to far overweigh the extra roundtrips at insertion with dependents.

Would this actually be a fix for index fragmentation? The SQL Server docs for NEWSEQUENTIALID even states that it might not return sequential values after a Windows restart.

roji · 2024-07-12T16:16:51Z

Those are great questions, but we don't have the answers to them - they need looking into.

roji added type-enhancement area-perf customer-reported area-save-changes area-sqlserver labels Apr 24, 2023

roji changed the title ~~Reconsider out default GUID generation strategy~~ Reconsider our default GUID generation strategy Apr 24, 2023

This was referenced May 22, 2023

ValueGenerator doesn't use NextAsync for the added object by the entity relation after SaveChangesAsync #30936

Closed

Obsolete AddAsync and change HiLo to generate IDs at SaveChanges time #30957

Open

ajcvickers added the needs-design label Jul 7, 2023

ajcvickers added this to the Backlog milestone Jul 7, 2023

ajcvickers added the breaking-change label Jul 7, 2023

roji mentioned this issue Sep 11, 2023

Consider stronger semantics around generated keys #13575

Open

4865783a5d mentioned this issue Sep 22, 2023

Consider changing the ServerSideSession Id from an int to a long DuendeSoftware/IdentityServer#1387

Closed

roji mentioned this issue Oct 23, 2023

Switch to UUIDv7 for client-side GUID generation npgsql/efcore.pg#2909

Closed

roji mentioned this issue Apr 20, 2024

Improve our SQL Server client-generated GUIDs #33579

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider our default GUID generation strategy #30753

Reconsider our default GUID generation strategy #30753

roji commented Apr 24, 2023 •

edited

Loading

ajcvickers commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented Sep 11, 2023

ConnerPhillis commented Sep 11, 2023 •

edited

Loading

JackTrapper commented Mar 29, 2024

onionhammer commented Apr 19, 2024

roji commented Apr 20, 2024

roji commented Apr 20, 2024

cremor commented Jul 12, 2024

roji commented Jul 12, 2024

Reconsider our default GUID generation strategy #30753

Reconsider our default GUID generation strategy #30753

Comments

roji commented Apr 24, 2023 • edited Loading

ajcvickers commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented Sep 11, 2023

ConnerPhillis commented Sep 11, 2023 • edited Loading

JackTrapper commented Mar 29, 2024

onionhammer commented Apr 19, 2024

roji commented Apr 20, 2024

roji commented Apr 20, 2024

cremor commented Jul 12, 2024

roji commented Jul 12, 2024

roji commented Apr 24, 2023 •

edited

Loading

ConnerPhillis commented Sep 11, 2023 •

edited

Loading