-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of DateTime.UtcNow on Windows #44771
Improve performance of DateTime.UtcNow on Windows #44771
Conversation
// count is the *absolute* number of 100-ns intervals which have elapsed since | ||
// January 1, 1601 (UTC). Due to leap second handling, the number of ticks per | ||
// day is variable. Dec. 30, 2016 had 864,000,000,000 ticks (a standard 24-hour | ||
// day), but Dec. 31, 2016 had 864,010,000,000 ticks due to leap second insertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention so far Windows doesn't carry any leap seconds in the systems by default. I think the leap seconds will start to show up in the future.
// | ||
// This relies on: | ||
// a) Leap seconds only ever taking place on the last day of the month | ||
// (historically these are Jun. & Dec., but we support any month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding the link https://datacenter.iers.org/data/latestVersion/16_BULLETIN_C16.txt can be a reference to that.
{ | ||
ulong deltaTicksFromStartOfMonth = osTicks - cache.OSFileTimeTicksAtStartOfMonth; | ||
if (deltaTicksFromStartOfMonth < cache.CacheValidityPeriodInTicks) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we talked offline about that and you mentioned the line
ulong deltaTicksFromStartOfMonth = osTicks - cache.OSFileTimeTicksAtStartOfMonth;
will get optimized. but I am not sure why we need to calculate the delta at all every time. we can just get rid of this delta calculation and calculate it once when we are caching it.
When caching, we calculate the delta of ticks = OSTicks - (dotnet calculated ticks from osticks) // let's call it ticksDelta
Also, when caching we'll calculate the OS ticks at the end of the current month at 23:59:59 // let's call it endOfMonthOSTicks
the code will be simplified as follow:
if (s_leapSecondCache is LeapSecondCache cache && cache.endOfMonthOSTicks > osticks)
{
return new DateTime(osticks - cache.ticksDelta + FileTimeOffset); // includes UTC marker
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this optimization would be valid. Consider the case where my system clock is Jan 1, 2017 00h00:01 UTC. I call DateTime.UtcNow
and populate the cache. Then Windows syncs the system clock against the network time server, and it turns out that my machine was oh-so-slightly fast. So I go back 3 seconds, and now my system clock is Dec 31, 2016 23h59:58 UTC per .NET's reckoning. But Dec 31, 2016 had a positive leap second! So 3 seconds behind Jan 1, 2017 00h00:01 UTC should actually be reported as Dec 31, 2016 23h59:59 UTC.
Is there a good way to account for this condition other than including both the start of month and the end of month times in the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks this scenario will be broken with your code too. no? you can run into the same issue when calculating OSFileTimeTicksAtStartOfMonth too. right? I think if the time on the system can be adjusted at any time we'll break as we'll not feel this adjustment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this scenario is broken with the proposed implementation. If the clock gets set backward to the previous month, then GetSystemTimeAsFileTime
will return a tick count less than our cached value cache.OSFileTimeTicksAtStartOfMonth
, and deltaTicksFromStartOfMonth will end up being a very large positive number (due to integer overflow). This causes the deltaTicksFromStartOfMonth < cache.CacheValidityPeriodInTicks
check to fail, sending us down the fallback code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand cannot insert historical leap seconds but you can still insert leap seconds at the end of the list (I guess). meaning you can start caching, then leap second get inserted at the end of previous month (and would be the last leap second in the list for Windows). I think this will break this caching logic. you can use the tool I pointed you at for inserting leap second and then you can try this scenario and look what you'll get.
Some extra perf numbers:
I don't know why proto_mixed_suppressgc clocks in at slightly slower than proto_mixed. The difference isn't enough to worry about, but I would've expected after many runs for them to be within each other's stddev. Maybe a bad conversion on my part? |
// clock is set backward, the resulting value will integer underflow and cause | ||
// a cache miss, forcing us down the slow path. | ||
|
||
if (s_leapSecondCache is LeapSecondCache cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this check every time too? can we just initialize instance with some value? or you have it for optimization too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. We could initialize it with a dummy value like (0, 0, 0). It would save two instructions (a comparison + conditional branch) in the x64 codegen. Benchmark doesn't really show a difference since the branch predictor is primed to assume the field is initialized.
Method | Job | Toolchain | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|---|
GetUtcNow | Job-LSFSOO | proto | 28.97 ns | 0.112 ns | 0.104 ns | 1.00 |
GetUtcNow | Job-LKFIYO | proto_initcache | 28.85 ns | 0.186 ns | 0.174 ns | 1.00 |
Even if there's no real perf difference, might be worth doing anyway just to simplify things?
FCFuncElement("FileTimeToSystemTime", SystemNative::FileTimeToSystemTime) | ||
FCFuncElement("SystemTimeToFileTime", SystemNative::SystemTimeToFileTime) | ||
#endif // TARGET_UNIX | ||
#if defined(TARGET_UNIX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be trivial to enable the managed implementation on Unix too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Delete this ifdef https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/DateTime.Unix.cs#L10 to enable the managed impl on all runtimes
- Delete this FCall
- Maybe mark
SystemNative_GetSystemTimeAsTicks
with SuppressGCTransition
internal static readonly bool s_systemSupportsLeapSeconds = SystemSupportsLeapSeconds(); | ||
private static LeapSecondCache? s_leapSecondCache; | ||
|
||
private static unsafe delegate* unmanaged[Stdcall]<ulong*, void> s_pfnGetSystemTimeAsFileTime = GetGetSystemTimeAsFileTimeFnPtr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static unsafe delegate* unmanaged[Stdcall]<ulong*, void> s_pfnGetSystemTimeAsFileTime = GetGetSystemTimeAsFileTimeFnPtr(); | |
private static unsafe delegate* unmanaged<ulong*, void> s_pfnGetSystemTimeAsFileTime = GetGetSystemTimeAsFileTimeFnPtr(); |
[Stdcall]
is not necessary. Platform default works fine. We do have explicit StdCall on Windows-specific DllImports in under libraries either.
DateTime startOfMonth = new DateTime(utcNow.Year, utcNow.Month, 1, 0, 0, 0, DateTimeKind.Utc); | ||
Debug.Assert(startOfMonth <= utcNow); | ||
|
||
Volatile.Write(ref s_leapSecondCache, new LeapSecondCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under CLR memory model reference writes to static fields always have release semantics so I think Volatile.Write
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writes to static fields do not always have release semantics on some of the platforms that Mono runs on. I think it is a good idea to have Volatile.Writes for clarity, even when it is not strictly required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to have Volatile.Writes for clarity, even when it is not strictly required.
I agree - I put it there for clarity. An alternative design would be to mark the static field itself volatile, then we wouldn't need the call to Volatile.Write
. But since correctness is predicated on only the write (not the read) being volatile, I thought it would be cleaner to make that fact explicit. If consensus is simply to mark the static field itself volatile I can do that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have hundreds of s_foo = new ...
assignments in this repo outside of class constructors. I do not see why this assignment is special in any way and inconsistency does not help clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have hundreds of s_foo = new ... assignments in this repo outside of class constructors.
Interestingly enough, I ran a quick Regex to find these, and the code below was one of the first matches. Whoops. 😳
runtime/src/libraries/System.Net.Http/src/System/Net/Http/Headers/EntityTagHeaderValue.cs
Lines 26 to 38 in 162b33e
public static EntityTagHeaderValue Any | |
{ | |
get | |
{ | |
if (s_any == null) | |
{ | |
s_any = new EntityTagHeaderValue(); | |
s_any._tag = "*"; | |
s_any._isWeak = false; | |
} | |
return s_any; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please link to the GitHub issues citing places it's weaker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching for "volatile read" finds, for example, #6280 and #10619. One of the bugs I filed in 2008 is here. Let me quote the official resolution:
Currently [the ECMA spec requirement] is not the MS implementation [...] We should update the ECMA spec such that the MS behavior is explicitly allowed [...] Then this bug can be resolved as "By Design".
If we are serious about following the ECMA spec, can we prioritize fixing long-standing bugs or finally updating the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect handling of volatile reads for 20 years is not just some corner case — it is obvious contract violation. For that reason alone we should prioritize properly defining the contract (i.e. correctness rules) over making the code look "correct".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open whatever issues you feel need to be opened. None of that contradicts being explicit about where volatile writes should happen, even if the coreclr JIT happens to already do so. I'm going to stop responding to this comment chain. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf numbers look great, and the comments that you have provided have convinced me about the correctness.
My only question is whether it would be possible to mock the delegate that returns actual time and write some tests for the edge cases.
I am hitting the "Accept" button, but please get @tarekgh approval and "the extra eyes from Windows" before merging.
internal static readonly bool s_systemSupportsLeapSeconds = SystemSupportsLeapSeconds(); | ||
private static LeapSecondCache? s_leapSecondCache; | ||
|
||
private static unsafe delegate* unmanaged[Stdcall]<ulong*, void> s_pfnGetSystemTimeAsFileTime = GetGetSystemTimeAsFileTimeFnPtr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exercise edge cases manually inside a debugger
How about writing a test that would be mocking this delegate using a reflection? I know it's hacky, but it would be great to have the edge cases covered by automated tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, but I wonder how fragile this might be. There's probably a bunch of code within the runtime that relies on DateTime.UtcNow
, and if that code is running in another thread while our unit test is operating it could cause some weird side effects. Would need to brainstorm a bit to see if there's a refactoring that might make this less fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we need to report the time exact as the system. It is up to the users/admins to play with the system time. It is same as if you change the system time zone when you are running. local time will be very different and can cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not going to be modified for tests using reflection, then could mark it as readonly
then JIT can potentially inline it as a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have these private reflection based testing hooks in debug/checked build only, so that we are not leaving performance on the table in the shipping bits.
@adamsitnik @tarekgh In the latest iteration, I split the cache lookup into its own helper method, separate from the OS p/invoke. This way, we can write a unit test which doesn't muck about with the calli target, which I was afraid could introduce stability problems (especially once Such unit test isn't written yet; this is just setting up for that. Do you think this approach satisfies the testing concerns you had? |
Here's the x64 codegen as of the latest iteration. ; System_Private_CoreLib!System.DateTime.get_UtcNow()
; lots of prolog due to the unmanaged calli
00007ff9`5105b5a0 55 push rbp
00007ff9`5105b5a1 4157 push r15
00007ff9`5105b5a3 4156 push r14
00007ff9`5105b5a5 4155 push r13
00007ff9`5105b5a7 4154 push r12
00007ff9`5105b5a9 57 push rdi
00007ff9`5105b5aa 56 push rsi
00007ff9`5105b5ab 53 push rbx
00007ff9`5105b5ac 4883ec78 sub rsp,78h
00007ff9`5105b5b0 488dac24b0000000 lea rbp,[rsp+0B0h]
00007ff9`5105b5b8 488d4d80 lea rcx,[rbp-80h]
00007ff9`5105b5bc 498bd2 mov rdx,r10
00007ff9`5105b5bf e81ce9885f call CoreCLR!JIT_InitPInvokeFrame (00007ff9`b08e9ee0)
00007ff9`5105b5c4 488bf0 mov rsi,rax
00007ff9`5105b5c7 488bcc mov rcx,rsp
00007ff9`5105b5ca 48894da0 mov qword ptr [rbp-60h],rcx
00007ff9`5105b5ce 488bcd mov rcx,rbp
00007ff9`5105b5d1 48894db0 mov qword ptr [rbp-50h],rcx
; read fnptr from static field and invoke it
00007ff9`5105b5d5 48b9d808db50f97f0000 mov rcx,7FF950DB08D8h
00007ff9`5105b5df 488b01 mov rax,qword ptr [rcx]
00007ff9`5105b5e2 488d4dc0 lea rcx,[rbp-40h]
00007ff9`5105b5e6 488d1512000000 lea rdx,[System_Private_CoreLib!System.DateTime.get_UtcNow()+0xffffffff`a70edd8f (00007ff9`5105b5ff)]
00007ff9`5105b5ed 488955a8 mov qword ptr [rbp-58h],rdx
00007ff9`5105b5f1 488d5580 lea rdx,[rbp-80h]
00007ff9`5105b5f5 48895610 mov qword ptr [rsi+10h],rdx
00007ff9`5105b5f9 c6460c00 mov byte ptr [rsi+0Ch],0
00007ff9`5105b5fd ffd0 call rax
00007ff9`5105b5ff c6460c01 mov byte ptr [rsi+0Ch],1
00007ff9`5105b603 48ba5c12cdb0f97f0000 mov rdx,offset CoreCLR!g_TrapReturningThreads (00007ff9`b0cd125c)
00007ff9`5105b60d 833a00 cmp dword ptr [rdx],0
00007ff9`5105b610 740c je System_Private_CoreLib!System.DateTime.get_UtcNow()+0xffffffff`a70eddae (00007ff9`5105b61e)
00007ff9`5105b612 48b9f833ccb0f97f0000 mov rcx,offset CoreCLR!hlpDynamicFuncTable+0xa8 (00007ff9`b0cc33f8)
00007ff9`5105b61c ff11 call qword ptr [rcx]
00007ff9`5105b61e 488b4d88 mov rcx,qword ptr [rbp-78h]
00007ff9`5105b622 48894e10 mov qword ptr [rsi+10h],rcx
00007ff9`5105b626 488b4dc0 mov rcx,qword ptr [rbp-40h]
; read s_cache and make sure we're still in bounds
00007ff9`5105b62a 48b81812d31bec010000 mov rax,1EC1BD31218h
00007ff9`5105b634 488b00 mov rax,qword ptr [rax]
00007ff9`5105b637 488bd1 mov rdx,rcx
00007ff9`5105b63a 482b5008 sub rdx,qword ptr [rax+8]
00007ff9`5105b63e 483b5018 cmp rdx,qword ptr [rax+18h]
00007ff9`5105b642 7309 jae System_Private_CoreLib!System.DateTime.get_UtcNow()+0xffffffff`a70edddd (00007ff9`5105b64d)
00007ff9`5105b644 48035010 add rdx,qword ptr [rax+10h]
00007ff9`5105b648 488bc2 mov rax,rdx
00007ff9`5105b64b eb05 jmp System_Private_CoreLib!System.DateTime.get_UtcNow()+0xffffffff`a70edde2 (00007ff9`5105b652)
; call Fallback
00007ff9`5105b64d e8965adaff call CLRStub[MethodDescPrestub]@7ff950e010e8 (00007ff9`50e010e8)
; epilog
00007ff9`5105b652 90 nop
00007ff9`5105b653 488d65c8 lea rsp,[rbp-38h]
00007ff9`5105b657 5b pop rbx
00007ff9`5105b658 5e pop rsi
00007ff9`5105b659 5f pop rdi
00007ff9`5105b65a 415c pop r12
00007ff9`5105b65c 415d pop r13
00007ff9`5105b65e 415e pop r14
00007ff9`5105b660 415f pop r15
00007ff9`5105b662 5d pop rbp
00007ff9`5105b663 c3 ret |
I would say the best to have just a simple test that get the time from .NET and from the OS and then compare them. The difference should not exceed some threshold like 100 ms or so. @GrabYourPitchforks I have added the comment #44771 (comment) I hope it is not lost in the discussions here :-) |
@GrabYourPitchforks yes, it does! Thanks for working on this.
@tarekgh but how such a test would be exercising the edge cases? |
This test is going to run with every PR we submit. for sure we'll hit the edge case live :-) we already hit such things with bad written tests when the daylight saving change and we see the failures. |
I see. We should most probably have both:
|
|
||
if (!s_systemSupportsLeapSeconds) | ||
{ | ||
return new DateTime((osTicks + FileTimeOffset) | KindUtc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could write as osTicks + (FileTimeOffset | KindUtc)
to enable constant folding.
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes. |
Resolves #13091.
This PR is based on a series of conversations @tarekgh and I have had recently with regard to short-circuiting our leap second handling logic where possible. For background, Windows understands that days normally have 86,400 seconds, but this can be off by ±1 second due to positive or negative leap seconds. .NET's DateTime type requires each day to have exactly 86,400 seconds. Previous .NET releases resolve this discrepancy through the following special-cased logic: "Is the current time 23:59:60.xxx? If so, I'll return a DateTime instance whose time component reads 23:59:59.999." Unfortunately, this check is not free, and it shows up in benchmarks when we access
DateTime.UtcNow
.This PR relies on the following properties of leap seconds:
This optimization works by caching the OS-reported tick value of 00h00:00 on the first day of the month, the corresponding .NET tick value, and the count of how many ticks would elapse between that and 23h59:59 on the final day of the month. If the OS-reported tick value is between 00h00:00 first day (inclusive) and 23h59:59 final day (exclusive), we know a leap second could not have occurred within that time interval, so we bypass the normal leap second handling logic and perform the equivalent of
DateTime.AddTicks
against the baseline "start of month" DateTime value. If the cache is stale, we update the cache. If we're within the small window at the end of the month where a leap second might kick in, we ignore the cache and fall down the slower path.Another change included in this PR is to remove the unmanaged implementation of the Windows
DateTime
logic, leaving the entire implementation fully managed except for the p/invokes directly into the operating system. This also unifies the Mono and CoreCLR implementations. The non-Windows implementation continues to be split across the managed and unmanaged world.Perf results
Todo before merge