-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Force inlining CheckTimeout in RegexRunner #27262
Conversation
Did it inline DoCheckTimeout? |
|
No, but I expected that, as it throws. I used the version from master for checking: https://source.dot.net/#System.Text.RegularExpressions/System/Text/RegularExpressions/RegexRunner.cs,7aac6cc6e4584ff3
|
danmoseley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know why the attribute is needed, but it seems reasonable to check it in meantime.
Did it actually improve throughput? |
|
I wouldn't want the PR to be merged before I can prove that the change improves throughput either by benchmarking with BDN or by looking at generated the native code. |
|
I guess I was drawing the conclusion it had since before Go+Scan (the callers)+CheckTimeout were 41.8% and after were 38.4%. I guess there is no substitute for actual timing though. |
|
The method with the The inliner is actually overestimating the potential win here as this is not really a wrapper method and there is no arg that feeds a constant test. I don't really see a compelling benefit from this inline either -- what is it that you saw? |
|
Since in many (?most) cases _ignoreTimeout is true, maybe we can just have protected void CheckTimeout()
{
if (!_ignoreTimeout)
DoCheckTimeout();
}
private void DoCheckTimeout()
{
// everything elseWhich will surely inline and be profitable? |
|
Thanks Andy! I'm curious, where did you get that data log from? You were right, the method wasn't inlined because the OR case was too big. I just refactored the code to just contain the timeout check and now it is inlined without the AggressiveInlineAttribute.
To optimize the common case (no timeout provided) it makes sense to inline the if condition to get rid off the unnecessary calls in this tight loop. |
4cfb8ec to
0f691c1
Compare
|
This change is benign enough now I don't think it needs justification by measurement before merging. But I am curious whether you see an improvement. |
|
The profitabilility analysis comes from setting I figured it might be helpful to explain how I'd think about this case. The assembly for this method is something like: G_M61682_IG02:
80790C00 cmp byte ptr [rcx+12], 0
750C jne SHORT G_M61682_IG03
8B4108 mov eax, dword ptr [rcx+8]
FFC8 dec eax
894108 mov dword ptr [rcx+8], eax
85C0 test eax, eax
7401 je SHORT G_M61682_IG04
G_M61682_IG03:
C3 ret
G_M61682_IG04:
48B888083A8AFD7F0000 mov rax, 0x7FFD8A3A0888
G_M61682_IG05:
48FFE0 rex.jmp rax // tail callAssuming
So the call version is perhaps a tiny bit slower than the inline version. It really depends on what else is going on around the call site; the inline version could well be slower. Likely the perf impact will be hard to measure reliably; generally we have trouble with timing based measurements once the performance difference gets to be less than 1% or so, and there a bunch of potentially confounding microarchitectural issues. The force inline per call site cost is quite a bit larger. The perf impact of code size is also hard to measure as it is very context dependent. Generally larger code will perform well in benchmarks but has the effect of slightly degrading perf of everything else, as larger code uses more of the scarce physical resources (caches, pages, etc). For framework and system code we usually assume that there is a lot of other important code in the system and so we should prefer smaller code unless we know for sure the extra code size brings important benefits. Things in favor of forcing inlining would be:
And things against forcing inlining would be:
The jit's inliner is typically quite conservative, because:
|
|
Thanks @AndyAyersMS for this detail. I find it fascinating that inlining a boolean check is not always a win. Certainly I would never have considered extracting a boolean check into a method for perf reasons. |
|
Thanks a lot Andy! Great intel that you shared with us. I will probably get back to this post numerous times. |
Commit migrated from dotnet/corefx@b89d83a
I can't get any valuable measurements with BDN here. I tried multiple times but it seems either my benchmarks are bad or the absolute costs of the CheckTimeout native calls are to small.
@AndyAyersMS Perfview tells me that without AggresiveInlining it won't inline the method with reason "unprofitable inline". Any idea why it doesn't inline a simple boolean check here (it doesn't inline either without the OR condition). Has it do with the state of inliner here (huge switch)?
Perfview before:
Perfview after:
cc @danmosemsft @stephentoub