Improve performance of DateTime.UtcNow on Windows #50263

GrabYourPitchforks · 2021-03-25T22:36:54Z

Follow-up to #44771. Builds atop work done in #45479 and #46690.
Resolves #13091.

This PR is based on a series of conversations @tarekgh and I have had recently with regard to short-circuiting our leap second handling logic where possible. For background, Windows understands that days normally have 86,400 seconds, but this can be off by ±1 second due to positive or negative leap seconds. .NET's DateTime type requires each day to have exactly 86,400 seconds. Previous .NET releases resolve this discrepancy through the following special-cased logic: "Is the current time 23:59:60.xxx? If so, I'll return a DateTime instance whose time component reads 23:59:59.999." Unfortunately, this check is not free, and it shows up in benchmarks when we access DateTime.UtcNow.

This PR relies on the following two properties of leap seconds:

leap seconds only ever kick in on the last second of the day, and
any given day will never be adjusted beyond ±1 second due to leap seconds.

This optimization works by keeping a cache which maps the last-seen Win32 FILETIME struct to its equivalent .NET DateTime struct. The cache is valid for some window of time (by default, 5 minutes). If a call to DateTime.UtcNow takes place during the cache's validity window, we don't bother asking the OS to deconstruct the FILETIME struct back into a SYSTEMTIME struct (which accounts for leap seconds). Instead, we know that we can just perform simple addition without worrying about leap second handling.

There is one condition we need to check: Does a leap second actually occur within the next 5 minutes? If so, we can't store (now, now + 5 min) into the cache, otherwise DateTime.UtcNow will start returning incorrect values as soon as midnight rolls around. To account for this, we slide the cache window backward, choosing to set the cache validity period to (today at 23:54:59, today at 23:59:59) instead of (now, now + 5 min).

The reason we use a 5 minute validity window instead of a 24-hour validity window is called out in code comments. tl;dr: The OS uses shared memory mapping to publish leap second tables to all processes, and it can extend that table without killing active processes or rebooting the machine. There is a slim but non-zero chance that this table can be updated to insert leap second data into the past. If this happens, our cache will be incorrect. After consultation with the Windows team, we decided that a 5-minute validity window is an appropriate tradeoff that improves performance in the common case and minimizes the window in which an application might encounter problems if this situation were to occur.

Method	Job	Toolchain	Mean	Error	StdDev	Ratio
GetUtcNow	Job-JYYDLR	main	65.15 ns	0.103 ns	0.097 ns	1.00
GetUtcNow	Job-RHLQLD	proto	27.74 ns	0.359 ns	0.336 ns	0.43

GrabYourPitchforks · 2021-03-25T22:39:52Z

Since it's very difficult to unit test this type, I'm performing manual testing by breaking into a debugger and mutating registers. Will validate edge case conditions and report back here.

Edit: Tarek already added some UtcNow unit tests a while back when initial leap second support was added. See dotnet/corefx#33939.

danmoseley · 2021-03-25T23:04:31Z

src/libraries/System.Private.CoreLib/src/System/DateTime.Windows.cs

+            // If FileTimeToSystemTime fails, call GetSystemTime and try again until it succeeds.
+            while (Interop.Kernel32.FileTimeToSystemTime(&fileTimeNow, &systemTimeNow) == Interop.BOOL.FALSE)
+            {
+                goto TryAgain;


If this loops forever for some reason -- I guess we assume the OS is broken and it doesn't really matter that we didn't fail?

I originally had Environment.FailFast in here but figured that there might be a race condition where the OS is publishing new data at the same time we're running this method. If something's borked in the OS we'll probably loop forever. Hopefully we'll only loop until midnight (which is 2 seconds away at worst). :)

If this is a concern I can perhaps keep a counter limiting us to 10 tries before we failfast? Or maybe always fall back to new DateTime(GetSystemTime())?

GrabYourPitchforks · 2021-03-25T23:53:08Z

Test case information

Ensure DateTime.UtcNow uses cache when inside validity period (line 30)
Ensure DateTime.UtcNow doesn't use cache when before validity period (line 33)
Ensure DateTime.UtcNow doesn't use cache when after validity period (line 33)

The three test cases above were verified by trapping calls to GetSystemTimePreciseAsFileTime (call site at line 19) in a debugger and manipulating the value before it's returned.

(For the following tests, ImageFileExecutionOptions was used to mark corerun.exe as "leap-second aware".)

Ensure cache early-exit is hit if seconds field reads 60 (line 215)

This test case was verified by using the w32tm tool to insert a positive leap second into the system table, then trapping calls to GetSystemTimePreciseAsFileTime to return the FILETIME corresponding to 23:59:60 on that date, then stepping through the debugger and validating that the returned SYSTEMTIME struct had its seconds field set to 60.

Ensure validity window is shifted back if a positive leap second would occur and that entire validity window is valid (line 253)

This test case was verified by using the w32tm tool to insert a positive leap second into the system table, then changing the system time to 23:59:30, then stepping through the debugger to ensure that the DateTime instance cached had the time value 23:54:59 and the DateTime instance returned had the time value 23:59:30.

Ensure validity window is shifted back if a negative leap second would occur and that entire validity window is valid (line 253)

This test case was verified by using the w32tm tool to insert a negative leap second into the system table, then changing the system time to 23:59:30, then stepping through the debugger to ensure that the DateTime instance cached had the time value 23:54:59 and the DateTime instance returned had the time value 23:59:30.

Tools and resources for testing

mattjohnsonpint · 2021-03-26T01:14:20Z

In real life, leap seconds can only occur on two specific dates, June 30th and Dec 31st (UTC). Would it be worth optimizing further for that? Or do we want to accept any leap second Windows throws at us?

GrabYourPitchforks · 2021-03-26T01:20:23Z

@mattjohnsonpint Good question. I don't see any good opportunity for us to take advantage of that. Even in the original abandoned PR (#44771), which was pretty aggressive about caching, we didn't look at the date component. We were actually asked to be less aggressive. :) So I knocked the cache validity window down from 23h59m59s to 5 min.

Given this, do you see an opportunity for us to use the date component to increase the size of the window? Nothing comes to mind on my end.

tarekgh · 2021-03-26T01:25:13Z

I want to mention a historical leap second can be inserted to the system too. if we make the cache update only at the end of June or December, that increase the risk when that leap second gets inserted.

src/libraries/System.Private.CoreLib/src/System/DateTime.Windows.cs

tarekgh

LGTM. Thanks for getting this ready.

One suggestion is to do some testing using w32tm.exe to insert some leap seconds and look how the .NET would report when doing that.

GrabYourPitchforks · 2021-03-26T02:13:25Z

@tarekgh You mean 4.8? Sure, I can do that. Any other runtimes of interest?

Open question: is it possible that DateTime.UtcNow might now be recursive? Like, some debugging or introspection engine hooking allocations and and calling DateTime.UtcNow as part of its logging mechanism? If this is a possibility, I can add a guard against it.

tarekgh · 2021-03-26T02:31:14Z

You mean 4.8? Sure, I can do that. Any other runtimes of interest?

No, I meant try the tool with your changes. the tool can allow you insert leap seconds. so, you can try run your test and in the middle run the tool to insert a leap second and look if the test continues behave nicely :-)

Open question: is it possible that DateTime.UtcNow might now be recursive? Like, some debugging or introspection engine hooking allocations and and calling DateTime.UtcNow as part of its logging mechanism? If this is a possibility, I can add a guard against it.

I am not aware of any case like that which can cause the recursion. I am not even sure if this is possible with UtcNow.

GrabYourPitchforks · 2021-03-26T08:16:23Z

Pondered this a bit more, and it turns out there are edge case scenarios (exception filters, possibly others) where arbitrary user code might run inside UtcNow. Makes me think we should have a guard against recursive calls, just so that we don't bring down the app in these weird edge cases. It won't add any overhead to the fast "cache is valid" path, so I don't think there's any code risk from adding them.

tarekgh · 2021-03-26T17:20:19Z

Makes me think we should have a guard against recursive calls,

Were you able to get into this case? could you please share more info how this can happen?

GrabYourPitchforks · 2021-03-26T17:58:14Z

Were you able to get into this case? could you please share more info how this can happen?

I didn't try it. It was just a thought experiment. But in theory:

try
{
    DoWork();
}
catch when (MyExceptionFilter())
{
    /* blah blah */
}

If DoWork throws an exception, MyExceptionFilter could end up running within DoWork's frame. (.NET doesn't unwind the call stack.)

If DoWork calls DateTime.UtcNow, and if DateTime.UtcNow throws an exception (perhaps an OOM due to the allocation in the cache miss path), and if MyExceptionFilter also calls DateTime.UtcNow (perhaps as part of logging), then we're in a recursive call scenario. I really don't think this is likely in practice, but I can't prove to myself that it will never happen.

GrabYourPitchforks · 2021-03-26T18:03:23Z

@tarekgh I tried all kinds of wonderful and horrible combinations, such as using w32tm to add leap seconds while this method was executing, changing the system time while this method was executing, changing the process's "60-second leap seconds" support while this method was executing, and so on. (I didn't try removing leap seconds while this method was executing, as that would violate OS invariants. The w32tm tool doesn't allow this without a system reboot anyway.)

In all cases, the cutoff point is the code below.

runtime/src/libraries/System.Private.CoreLib/src/System/DateTime.Windows.cs

Lines 223 to 228 in 941e477

    
           ulong fileTimeAtEndOfValidityPeriod = fileTimeNow + LeapSecondCache.ValidityPeriodInTicks; 
        
           Interop.Kernel32.SYSTEMTIME systemTimeAtEndOfValidityPeriod; 
        
           if (Interop.Kernel32.FileTimeToSystemTime(&fileTimeAtEndOfValidityPeriod, &systemTimeAtEndOfValidityPeriod) == Interop.BOOL.FALSE) 
        
           { 
        
               goto TryAgain; 
        
           }

If the clock / leap seconds are changed before this point, we correctly respect it. Once this point is hit, we're already down the "start calculating values to cache" code path. So if the clock / leap seconds change after this point, we'll end up caching incorrect values for 5 minutes. Then after that 5 minute window is up, we'll evict our cache and query the OS for the actual time again.

To the application, the observed behavior is that the return value of DateTime.UtcNow might jump a little bit forward or backward once the cache expires. It seems to be no different than today's behavior if I were to change the system clock by a second or two while the app is running. I didn't see any ill effect.

I'll update the code comments to clarify this.

tarekgh · 2021-03-26T18:11:29Z

@GrabYourPitchforks thanks for testing with w32tm tool. I think at least we have confirmed the expected behavior and what would be the worst-case scenario which still be acceptable considering these are rare scenarios. We should be good to go.

tarekgh · 2021-03-26T18:14:32Z

If DoWork calls DateTime.UtcNow, and if DateTime.UtcNow throws an exception (perhaps an OOM due to the allocation in the cache miss path), and if MyExceptionFilter also calls DateTime.UtcNow (perhaps as part of logging), then we're in a recursive call scenario. I really don't think this is likely in practice, but I can't prove to myself that it will never happen.

I don't think this is a specific problem for that case but it is generic issue which can happen in many ways.

CC @jkotas if he has any thoughts around that.

mnmr · 2021-03-26T18:18:25Z

I was wondering if there has been a discussion about changing the way leap seconds are handled?

Reading this discussion makes me think that the same code running on Windows and Linux will report different times, which is unfortunate. Google has in interesting solution where leap seconds are "distributed" over an hour, which has the nice side effect that time only ever moves forward. Is something like this being considered at all?

Apologies if this is not the right forum for voicing such an idea.

GrabYourPitchforks · 2021-03-26T18:22:54Z

Google has in interesting solution where leap seconds are "distributed" over an hour, which has the nice side effect that time only ever moves forward. Is something like this being considered at all?

This is also the default behavior for Windows applications. Applications must opt in to leap second support. If your application has not opted in, the OS fudges the time it reports back to you by slowing down the clock over the last 2 seconds.

Examples:

Actual time	GetSystemTime returns
23:59:59.000	23:59:59.000
23:59:59.500	23:59:59.250
23:59:60.000	23:59:59.500
23:59:60.500	23:59:59.750

The logic proposed by this PR handles this scenario fine.

jkotas · 2021-03-26T19:13:08Z

I don't think this is a specific problem for that case but it is generic issue which can happen in many ways.

Yes, OOM can happen at atribitrary points, e.g. due to JITing.

Makes me think we should have a guard against recursive calls, just so that we don't bring down the app in these weird edge cases

Not worth it. The app is going down once you hit this situation anyway.

Introduce a leap second cache on Windows

941e477

GrabYourPitchforks added area-System.Runtime optimization labels Mar 25, 2021

GrabYourPitchforks added this to the 6.0.0 milestone Mar 25, 2021

GrabYourPitchforks requested a review from tarekgh March 25, 2021 22:36

danmoseley reviewed Mar 25, 2021

View reviewed changes

tarekgh mentioned this pull request Mar 26, 2021

Developers can benefit from enhanced Date and Time types and Timezone support #45318

Closed

4 tasks

tarekgh reviewed Mar 26, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/DateTime.Windows.cs Show resolved Hide resolved

tarekgh approved these changes Mar 26, 2021

View reviewed changes

Code cleanup and use fallback instead of entering loop

fd08554

GrabYourPitchforks merged commit eed3c76 into dotnet:main Mar 27, 2021

GrabYourPitchforks deleted the datetime_utcnow branch March 27, 2021 04:52

adamsitnik added the tenet-performance Performance related issue label Mar 29, 2021

ghost locked as resolved and limited conversation to collaborators Apr 28, 2021

jeffhandley added area-System.DateTime and removed area-System.Runtime labels Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of DateTime.UtcNow on Windows #50263

Improve performance of DateTime.UtcNow on Windows #50263

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

danmoseley Mar 25, 2021

GrabYourPitchforks Mar 25, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

mattjohnsonpint commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

tarekgh commented Mar 26, 2021

tarekgh left a comment

GrabYourPitchforks commented Mar 26, 2021

tarekgh commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021 •

edited

Loading

tarekgh commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021 •

edited

Loading

tarekgh commented Mar 26, 2021

tarekgh commented Mar 26, 2021

mnmr commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

jkotas commented Mar 26, 2021

Improve performance of DateTime.UtcNow on Windows #50263

Improve performance of DateTime.UtcNow on Windows #50263

Conversation

GrabYourPitchforks commented Mar 25, 2021 • edited Loading

GrabYourPitchforks commented Mar 25, 2021 • edited Loading

danmoseley Mar 25, 2021

Choose a reason for hiding this comment

GrabYourPitchforks Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

GrabYourPitchforks commented Mar 25, 2021 • edited Loading

Test case information

Tools and resources for testing

mattjohnsonpint commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

tarekgh commented Mar 26, 2021

tarekgh left a comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Mar 26, 2021

tarekgh commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021 • edited Loading

tarekgh commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021 • edited Loading

tarekgh commented Mar 26, 2021

tarekgh commented Mar 26, 2021

mnmr commented Mar 26, 2021

GrabYourPitchforks commented Mar 26, 2021

jkotas commented Mar 26, 2021

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

GrabYourPitchforks Mar 25, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 25, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 26, 2021 •

edited

Loading

GrabYourPitchforks commented Mar 26, 2021 •

edited

Loading