-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of DateTime.UtcNow on Windows #50263
Improve performance of DateTime.UtcNow on Windows #50263
Conversation
Since it's very difficult to unit test this type, I'm performing manual testing by breaking into a debugger and mutating registers. Will validate edge case conditions and report back here. Edit: Tarek already added some |
// If FileTimeToSystemTime fails, call GetSystemTime and try again until it succeeds. | ||
while (Interop.Kernel32.FileTimeToSystemTime(&fileTimeNow, &systemTimeNow) == Interop.BOOL.FALSE) | ||
{ | ||
goto TryAgain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this loops forever for some reason -- I guess we assume the OS is broken and it doesn't really matter that we didn't fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had Environment.FailFast
in here but figured that there might be a race condition where the OS is publishing new data at the same time we're running this method. If something's borked in the OS we'll probably loop forever. Hopefully we'll only loop until midnight (which is 2 seconds away at worst). :)
If this is a concern I can perhaps keep a counter limiting us to 10 tries before we failfast? Or maybe always fall back to new DateTime(GetSystemTime())
?
Test case information
The three test cases above were verified by trapping calls to (For the following tests, ImageFileExecutionOptions was used to mark corerun.exe as "leap-second aware".)
This test case was verified by using the
This test case was verified by using the
This test case was verified by using the Tools and resources for testing |
In real life, leap seconds can only occur on two specific dates, June 30th and Dec 31st (UTC). Would it be worth optimizing further for that? Or do we want to accept any leap second Windows throws at us? |
@mattjohnsonpint Good question. I don't see any good opportunity for us to take advantage of that. Even in the original abandoned PR (#44771), which was pretty aggressive about caching, we didn't look at the date component. We were actually asked to be less aggressive. :) So I knocked the cache validity window down from 23h59m59s to 5 min. Given this, do you see an opportunity for us to use the date component to increase the size of the window? Nothing comes to mind on my end. |
I want to mention a historical leap second can be inserted to the system too. if we make the cache update only at the end of June or December, that increase the risk when that leap second gets inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for getting this ready.
One suggestion is to do some testing using w32tm.exe
to insert some leap seconds and look how the .NET would report when doing that.
@tarekgh You mean 4.8? Sure, I can do that. Any other runtimes of interest? Open question: is it possible that DateTime.UtcNow might now be recursive? Like, some debugging or introspection engine hooking allocations and and calling DateTime.UtcNow as part of its logging mechanism? If this is a possibility, I can add a guard against it. |
No, I meant try the tool with your changes. the tool can allow you insert leap seconds. so, you can try run your test and in the middle run the tool to insert a leap second and look if the test continues behave nicely :-)
I am not aware of any case like that which can cause the recursion. I am not even sure if this is possible with UtcNow. |
Pondered this a bit more, and it turns out there are edge case scenarios (exception filters, possibly others) where arbitrary user code might run inside UtcNow. Makes me think we should have a guard against recursive calls, just so that we don't bring down the app in these weird edge cases. It won't add any overhead to the fast "cache is valid" path, so I don't think there's any code risk from adding them. |
Were you able to get into this case? could you please share more info how this can happen? |
I didn't try it. It was just a thought experiment. But in theory: try
{
DoWork();
}
catch when (MyExceptionFilter())
{
/* blah blah */
} If DoWork throws an exception, MyExceptionFilter could end up running within DoWork's frame. (.NET doesn't unwind the call stack.) If DoWork calls |
@tarekgh I tried all kinds of wonderful and horrible combinations, such as using In all cases, the cutoff point is the code below. runtime/src/libraries/System.Private.CoreLib/src/System/DateTime.Windows.cs Lines 223 to 228 in 941e477
If the clock / leap seconds are changed before this point, we correctly respect it. Once this point is hit, we're already down the "start calculating values to cache" code path. So if the clock / leap seconds change after this point, we'll end up caching incorrect values for 5 minutes. Then after that 5 minute window is up, we'll evict our cache and query the OS for the actual time again. To the application, the observed behavior is that the return value of I'll update the code comments to clarify this. |
@GrabYourPitchforks thanks for testing with w32tm tool. I think at least we have confirmed the expected behavior and what would be the worst-case scenario which still be acceptable considering these are rare scenarios. We should be good to go. |
I don't think this is a specific problem for that case but it is generic issue which can happen in many ways. CC @jkotas if he has any thoughts around that. |
I was wondering if there has been a discussion about changing the way leap seconds are handled? Reading this discussion makes me think that the same code running on Windows and Linux will report different times, which is unfortunate. Google has in interesting solution where leap seconds are "distributed" over an hour, which has the nice side effect that time only ever moves forward. Is something like this being considered at all? Apologies if this is not the right forum for voicing such an idea. |
This is also the default behavior for Windows applications. Applications must opt in to leap second support. If your application has not opted in, the OS fudges the time it reports back to you by slowing down the clock over the last 2 seconds. Examples:
The logic proposed by this PR handles this scenario fine. |
Yes, OOM can happen at atribitrary points, e.g. due to JITing.
Not worth it. The app is going down once you hit this situation anyway. |
Follow-up to #44771. Builds atop work done in #45479 and #46690.
Resolves #13091.
This PR is based on a series of conversations @tarekgh and I have had recently with regard to short-circuiting our leap second handling logic where possible. For background, Windows understands that days normally have 86,400 seconds, but this can be off by ±1 second due to positive or negative leap seconds. .NET's DateTime type requires each day to have exactly 86,400 seconds. Previous .NET releases resolve this discrepancy through the following special-cased logic: "Is the current time 23:59:60.xxx? If so, I'll return a DateTime instance whose time component reads 23:59:59.999." Unfortunately, this check is not free, and it shows up in benchmarks when we access
DateTime.UtcNow
.This PR relies on the following two properties of leap seconds:
This optimization works by keeping a cache which maps the last-seen Win32
FILETIME
struct to its equivalent .NETDateTime
struct. The cache is valid for some window of time (by default, 5 minutes). If a call toDateTime.UtcNow
takes place during the cache's validity window, we don't bother asking the OS to deconstruct theFILETIME
struct back into aSYSTEMTIME
struct (which accounts for leap seconds). Instead, we know that we can just perform simple addition without worrying about leap second handling.There is one condition we need to check: Does a leap second actually occur within the next 5 minutes? If so, we can't store (now, now + 5 min) into the cache, otherwise
DateTime.UtcNow
will start returning incorrect values as soon as midnight rolls around. To account for this, we slide the cache window backward, choosing to set the cache validity period to (today at 23:54:59, today at 23:59:59) instead of (now, now + 5 min).The reason we use a 5 minute validity window instead of a 24-hour validity window is called out in code comments. tl;dr: The OS uses shared memory mapping to publish leap second tables to all processes, and it can extend that table without killing active processes or rebooting the machine. There is a slim but non-zero chance that this table can be updated to insert leap second data into the past. If this happens, our cache will be incorrect. After consultation with the Windows team, we decided that a 5-minute validity window is an appropriate tradeoff that improves performance in the common case and minimizes the window in which an application might encounter problems if this situation were to occur.