Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilteredByLevel_InsideScope benchmark has regressed #59267

Closed
adamsitnik opened this issue Sep 17, 2021 · 30 comments
Closed

FilteredByLevel_InsideScope benchmark has regressed #59267

adamsitnik opened this issue Sep 17, 2021 · 30 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

It seems to be affecting only Windows (-10% on average). Was detected in DrewScoggins/performance-2#3245 but not reported in runtime repo (cc @DrewScoggins).

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fMicrosoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger%3a%20False%2c%20CaptureScopes%3a%20True).html

image

Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger: False, CaptureScopes: True)

Result Ratio Operating System
Slower 0.83 Windows 10.0.19043.1165
Slower 0.85 Windows 10.0.20348
Same 0.91 Windows 10.0.20348
Slower 0.87 Windows 10.0.18363.1621
Slower 0.87 Windows 8.1
Slower 0.86 Windows 10.0.19042.685
Slower 0.83 Windows 10.0.19043.1165
Slower 0.84 Windows 10.0.22454
Slower 0.81 Windows 10.0.22451
Slower 0.81 Windows 10.0.19042.1165
Slower 0.52 Windows 7 SP1
Same 0.92 centos 8
Slower 0.85 debian 10
Same 0.90 rhel 7
Slower 0.87 sles 15
Same 0.92 opensuse-leap 15.3
Same 0.99 ubuntu 18.04
Same 0.97 ubuntu 18.04
Same 1.04 alpine 3.13
Same 0.99 ubuntu 16.04
Same 1.03 Windows 10.0.19043.1165
Faster 1.14 Windows 10.0.22000
Same 1.03 Windows 10.0.19043.1165
Same 0.90 Windows 10.0.18363.1621
Same 0.91 Windows 10.0.19043.1165
Same 1.02 macOS Big Sur 11.5.2
Same 0.97 macOS Big Sur 11.5.2
Same 0.98 macOS Big Sur 11.4
@ghost
Copy link

ghost commented Sep 17, 2021

Tagging subscribers to this area: @maryamariyan
See info in area-owners.md if you want to be subscribed.

Issue Details

It seems to be affecting only Windows (-10% on average). Was detected in DrewScoggins/performance-2#3245 but not reported in runtime repo (cc @DrewScoggins).

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fMicrosoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger%3a%20False%2c%20CaptureScopes%3a%20True).html

image

Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger: False, CaptureScopes: True)

Result Ratio Operating System
Slower 0.83 Windows 10.0.19043.1165
Slower 0.85 Windows 10.0.20348
Same 0.91 Windows 10.0.20348
Slower 0.87 Windows 10.0.18363.1621
Slower 0.87 Windows 8.1
Slower 0.86 Windows 10.0.19042.685
Slower 0.83 Windows 10.0.19043.1165
Slower 0.84 Windows 10.0.22454
Slower 0.81 Windows 10.0.22451
Slower 0.81 Windows 10.0.19042.1165
Slower 0.52 Windows 7 SP1
Same 0.92 centos 8
Slower 0.85 debian 10
Same 0.90 rhel 7
Slower 0.87 sles 15
Same 0.92 opensuse-leap 15.3
Same 0.99 ubuntu 18.04
Same 0.97 ubuntu 18.04
Same 1.04 alpine 3.13
Same 0.99 ubuntu 16.04
Same 1.03 Windows 10.0.19043.1165
Faster 1.14 Windows 10.0.22000
Same 1.03 Windows 10.0.19043.1165
Same 0.90 Windows 10.0.18363.1621
Same 0.91 Windows 10.0.19043.1165
Same 1.02 macOS Big Sur 11.5.2
Same 0.97 macOS Big Sur 11.5.2
Same 0.98 macOS Big Sur 11.4
Author: adamsitnik
Assignees: -
Labels:

os-windows, tenet-performance, area-Extensions-Logging

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 17, 2021
@EgorBo
Copy link
Member

EgorBo commented Sep 17, 2021

image
that spike has too many commits to identify 😢 (missing data for ~7 days)

@DrewScoggins
Copy link
Member

After doing git bisect, this seems to be the offending commit #41897

@tarekgh
Copy link
Member

tarekgh commented Sep 23, 2021

@jkoritzinsky @AndyAyersMS do you think this regression is related to #41897?

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Sep 23, 2021
@tarekgh tarekgh added this to the 6.0.0 milestone Sep 23, 2021
@jkoritzinsky
Copy link
Member

As part of that issue, our PGO data got out of date, so I would not be amazed to have seen a regression then. However, now that we've collected new PGO data for coreclr and clrjit, I would expect that any regression from #41897 would have been fixed since then when we finally got the PGO collection pipeline back online.

@tarekgh
Copy link
Member

tarekgh commented Sep 28, 2021

@DrewScoggins could you please validate if the regression in this issue is gone after PGO fix?

@AndyAyersMS
Copy link
Member

Seems like we need more analysis here. Does anyone have a profile indicating where this test spends its time? I can't tell offhand if it is in jitted code or the native parts of the runtime.

PGO information got updated in January so if this was PGO related it would have cleared up long ago. So it is probably something else.

@tarekgh
Copy link
Member

tarekgh commented Sep 28, 2021

@jkoritzinsky I'll assign this issue to you as remarked before it is a regression caused by #41897. Please let's know if you think anything related to logging here. Thanks!

@ghost
Copy link

ghost commented Sep 28, 2021

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

It seems to be affecting only Windows (-10% on average). Was detected in DrewScoggins/performance-2#3245 but not reported in runtime repo (cc @DrewScoggins).

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fMicrosoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger%3a%20False%2c%20CaptureScopes%3a%20True).html

image

Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger: False, CaptureScopes: True)

Result Ratio Operating System
Slower 0.83 Windows 10.0.19043.1165
Slower 0.85 Windows 10.0.20348
Same 0.91 Windows 10.0.20348
Slower 0.87 Windows 10.0.18363.1621
Slower 0.87 Windows 8.1
Slower 0.86 Windows 10.0.19042.685
Slower 0.83 Windows 10.0.19043.1165
Slower 0.84 Windows 10.0.22454
Slower 0.81 Windows 10.0.22451
Slower 0.81 Windows 10.0.19042.1165
Slower 0.52 Windows 7 SP1
Same 0.92 centos 8
Slower 0.85 debian 10
Same 0.90 rhel 7
Slower 0.87 sles 15
Same 0.92 opensuse-leap 15.3
Same 0.99 ubuntu 18.04
Same 0.97 ubuntu 18.04
Same 1.04 alpine 3.13
Same 0.99 ubuntu 16.04
Same 1.03 Windows 10.0.19043.1165
Faster 1.14 Windows 10.0.22000
Same 1.03 Windows 10.0.19043.1165
Same 0.90 Windows 10.0.18363.1621
Same 0.91 Windows 10.0.19043.1165
Same 1.02 macOS Big Sur 11.5.2
Same 0.97 macOS Big Sur 11.5.2
Same 0.98 macOS Big Sur 11.4
Author: adamsitnik
Assignees: jkoritzinsky
Labels:

os-windows, tenet-performance, area-Infrastructure

Milestone: 6.0.0

@jkoritzinsky
Copy link
Member

There's nothing related to logging in that PR. That PR was a large infrastructure change to use a different native build orchestrator (Ninja instead of MSBuild). There's little to no managed build changes there. I agree with @AndyAyersMS that we'll need a trace to see where the benchmark spends its time to be able to dig down into why there is still a regression today.

@jeffhandley
Copy link
Member

...we'll need a trace to see where the benchmark spends its time to be able to dig down into why there is still a regression today.

@jkoritzinsky Is the action on this in your court, or do you need this trace from someone else?

@jkoritzinsky
Copy link
Member

I can try to get the trace, but I'd prefer if someone who is more familiar with how we have the process set up to run against a recent runtime build could collect it instead.

@jeffhandley
Copy link
Member

I can try to get the trace, but I'd prefer if someone who is more familiar with how we have the process set up to run against a recent runtime build could collect it instead.

@adamsitnik Can you help out with getting this started please?

@jeffhandley jeffhandley added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 7, 2021
@DrewScoggins
Copy link
Member

I can get some traces collected and take a first cut at them.

@DrewScoggins
Copy link
Member

So I have collected traces, and you can find them here along with the copies of the runtime used to generate them, including PDBs. https://pvscmdupload.blob.core.windows.net/drewtest/59267.zip

My initial investigation shows that the regression is coming from more time spent in JIT_VirtualFunctionPointer and GenericHandleCacheTraits::CompareKeys. I dug into the disassembly of coreclr.dll for both, and I noticed a few thing. First we are allocating 0x20 more bytes on the stack than we were before. We are also now directly calling JitGenericHandleCacheTraits::CompareKeys as part of the execution of the JIT_VirtualFunctionPointer. Also looking at the code before we get into the search loop we seem to setup the cache more efficiently, with fewer moves.

I am not certain as to why this orchestration change caused this. My best guess is that we are not passing the same set of parameters to the native compiler any longer, and as such are getting different code out. Maybe someone else can speak with more authority on that.

image

Baseline Disasm

                             *************************************************************
                             *                           FUNCTION                         
                             *************************************************************
                             void *  __fastcall  JIT_VirtualFunctionPointer (Object * 
             void *            RAX:8          <RETURN>
             Object *          RCX:8          param_1
             CORINFO_CLASS_    RDX:8          param_2
             CORINFO_METHOD    R8:8           param_3
                             ?JIT_VirtualFunctionPointer@@YAPEAXPEAVObject@  XREF[3]:     1803da588 (*) ,  180400848 (*) , 
                             JIT_VirtualFunctionPointer                                   1804b7f50 (*)   
       1801083c0 48  89  5c       MOV        qword ptr [RSP  + 0x8 ],RBX
                 24  08
       1801083c5 48  89  74       MOV        qword ptr [RSP  + 0x10 ],RSI
                 24  10
       1801083ca 57              PUSH       RDI
       1801083cb 48  83  ec       SUB        RSP ,0x20
                 20
       1801083cf 49  8b  f0       MOV        RSI ,R8
       1801083d2 4c  8b  da       MOV        R11 ,RDX
       1801083d5 48  85  c9       TEST       RCX ,RCX
       1801083d8 74  44           JZ         LAB_18010841e
       1801083da 48  8b  19       MOV        RBX ,qword ptr [RCX ]
       1801083dd 33  d2           XOR        EDX ,EDX
       1801083df 8b  c6           MOV        EAX ,ESI
       1801083e1 45  8b  d3       MOV        R10D ,R11D
       1801083e4 c1  c8  05       ROR        EAX ,0x5
       1801083e7 03  c3           ADD        EAX ,EBX
       1801083e9 41  c1  c2       ROL        R10D ,0x5
                 05
       1801083ed 44  03  d0       ADD        R10D ,EAX
       1801083f0 48  8b  05       MOV        RAX ,qword ptr [g_pJitGenericHandleCache ]       = NaP
                 61  56  39 
                 00
       1801083f7 4c  8b  40       MOV        R8 ,qword ptr [RAX  + 0x20 ]
                 20
       1801083fb 41  8b  c2       MOV        EAX ,R10D
       1801083fe 41  f7  70       DIV        dword ptr [R8  + 0x8 ]
                 08
       180108402 49  8b  00       MOV        RAX ,qword ptr [R8 ]
       180108405 48  8b  04       MOV        RAX ,qword ptr [RAX  + RDX *0x8 ]
                 d0
       180108409 48  85  c0       TEST       RAX ,RAX
       18010840c 74  10           JZ         LAB_18010841e
       18010840e 66  90           NOP
                             LAB_180108410                                   XREF[1]:     18010841c (j)   
       180108410 44  39  50       CMP        dword ptr [RAX  + 0x8 ],R10D
                 08
       180108414 74  23           JZ         LAB_180108439
                             LAB_180108416                                   XREF[4]:     18010843d (j) ,  180108443 (j) , 
                                                                                          180108449 (j) ,  18010844f (j)   
       180108416 48  8b  00       MOV        RAX ,qword ptr [RAX ]
       180108419 48  85  c0       TEST       RAX ,RAX
       18010841c 75  f2           JNZ        LAB_180108410
                             LAB_18010841e                                   XREF[2]:     1801083d8 (j) ,  18010840c (j)   
       18010841e 4c  8b  c6       MOV        R8 ,RSI
       180108421 49  8b  d3       MOV        RDX ,R11
       180108424 e8  87  00       CALL       JIT_VirtualFunctionPointer_Framed                void * JIT_VirtualFunctionPointe
                 00  00
       180108429 48  8b  5c       MOV        RBX ,qword ptr [RSP  + 0x30 ]
                 24  30
       18010842e 48  8b  74       MOV        RSI ,qword ptr [RSP  + 0x38 ]
                 24  38
       180108433 48  83  c4       ADD        RSP ,0x20
                 20
       180108437 5f              POP        RDI
       180108438 c3              RET
                             LAB_180108439                                   XREF[1]:     180108414 (j)   
       180108439 48  39  58       CMP        qword ptr [RAX  + 0x18 ],RBX
                 18
       18010843d 75  d7           JNZ        LAB_180108416
       18010843f 4c  39  58       CMP        qword ptr [RAX  + 0x20 ],R11
                 20
       180108443 75  d1           JNZ        LAB_180108416
       180108445 48  39  70       CMP        qword ptr [RAX  + 0x28 ],RSI
                 28
       180108449 75  cb           JNZ        LAB_180108416
       18010844b f6  40  30       TEST       byte ptr [RAX  + 0x30 ],0x1
                 01
       18010844f 74  c5           JZ         LAB_180108416
       180108451 48  8b  40       MOV        RAX ,qword ptr [RAX  + 0x10 ]
                 10
       180108455 48  8b  5c       MOV        RBX ,qword ptr [RSP  + 0x30 ]
                 24  30
       18010845a 48  8b  74       MOV        RSI ,qword ptr [RSP  + 0x38 ]
                 24  38
       18010845f 48  83  c4       ADD        RSP ,0x20
                 20
       180108463 5f              POP        RDI
       180108464 c3              RET

Compare Disasm

                             *************************************************************
                             *                           FUNCTION                         
                             *************************************************************
                             void *  __fastcall  JIT_VirtualFunctionPointer (Object * 
             void *            RAX:8          <RETURN>
             Object *          RCX:8          param_1
             CORINFO_CLASS_    RDX:8          param_2
             CORINFO_METHOD    R8:8           param_3
                             ?JIT_VirtualFunctionPointer@@YAPEAXPEAVObject@  XREF[3]:     1803b75a6 (*) ,  1803ebf28 (*) , 
                             JIT_VirtualFunctionPointer                                   1804a4a04 (*)   
       1800f8190 48  8b  c4       MOV        RAX ,RSP
       1800f8193 48  89  58       MOV        qword ptr [RAX  + 0x8 ],RBX
                 08
       1800f8197 48  89  70       MOV        qword ptr [RAX  + 0x10 ],RSI
                 10
       1800f819b 57              PUSH       RDI
       1800f819c 48  83  ec       SUB        RSP ,0x40
                 40
       1800f81a0 49  8b  f8       MOV        RDI ,R8
       1800f81a3 48  8b  f2       MOV        RSI ,RDX
       1800f81a6 4c  8b  d9       MOV        R11 ,RCX
       1800f81a9 48  85  c9       TEST       RCX ,RCX
       1800f81ac 74  7c           JZ         LAB_1800f822a
       1800f81ae 4c  8b  09       MOV        R9 ,qword ptr [RCX ]
       1800f81b1 8b  df           MOV        EBX ,EDI
       1800f81b3 48  89  50       MOV        qword ptr [RAX  + -0x20 ],RDX
                 e0
       1800f81b7 33  d2           XOR        EDX ,EDX
       1800f81b9 4c  89  40       MOV        qword ptr [RAX  + -0x18 ],R8
                 e8
       1800f81bd 4c  89  48       MOV        qword ptr [RAX  + -0x28 ],R9
                 d8
       1800f81c1 48  c7  40       MOV        qword ptr [RAX  + -0x10 ],0x1
                 f0  01  00 
                 00  00
       1800f81c9 8b  c6           MOV        EAX ,ESI
       1800f81cb c1  c0  05       ROL        EAX ,0x5
       1800f81ce 41  03  c1       ADD        EAX ,R9D
       1800f81d1 c1  cb  05       ROR        EBX ,0x5
       1800f81d4 03  d8           ADD        EBX ,EAX
       1800f81d6 48  8b  05       MOV        RAX ,qword ptr [g_pJitGenericHandleCache ]       = NaP
                 8b  27  39 
                 00
       1800f81dd 4c  8b  40       MOV        R8 ,qword ptr [RAX  + 0x20 ]
                 20
       1800f81e1 8b  c3           MOV        EAX ,EBX
       1800f81e3 41  f7  70       DIV        dword ptr [R8  + 0x8 ]
                 08
       1800f81e7 49  8b  00       MOV        RAX ,qword ptr [R8 ]
       1800f81ea 4c  8b  14       MOV        R10 ,qword ptr [RAX  + RDX *0x8 ]
                 d0
       1800f81ee eb  14           JMP        LAB_1800f8204
                             LAB_1800f81f0                                   XREF[1]:     1800f820f (j)   
       1800f81f0 48  8d  54       LEA        RDX ,[RSP  + 0x20 ]
                 24  20
       1800f81f5 49  8b  ca       MOV        RCX ,R10
       1800f81f8 e8  3f  00       CALL       JitGenericHandleCacheTraits::CompareKeys         int CompareKeys(EEHashEntry * pa
                 00  00
       1800f81fd 85  c0           TEST       EAX ,EAX
       1800f81ff 75  10           JNZ        LAB_1800f8211
                             LAB_1800f8201                                   XREF[1]:     1800f820d (j)   
       1800f8201 4d  8b  12       MOV        R10 ,qword ptr [R10 ]
                             LAB_1800f8204                                   XREF[1]:     1800f81ee (j)   
       1800f8204 4d  85  d2       TEST       R10 ,R10
       1800f8207 74  08           JZ         LAB_1800f8211
       1800f8209 41  39  5a       CMP        dword ptr [R10  + 0x8 ],EBX
                 08
       1800f820d 75  f2           JNZ        LAB_1800f8201
       1800f820f eb  df           JMP        LAB_1800f81f0
                             LAB_1800f8211                                   XREF[2]:     1800f81ff (j) ,  1800f8207 (j)   
       1800f8211 4d  85  d2       TEST       R10 ,R10
       1800f8214 74  14           JZ         LAB_1800f822a
       1800f8216 49  8b  42       MOV        RAX ,qword ptr [R10  + 0x10 ]
                 10
                             LAB_1800f821a                                   XREF[1]:     1800f8238 (j)   
       1800f821a 48  8b  5c       MOV        RBX ,qword ptr [RSP  + 0x50 ]
                 24  50
       1800f821f 48  8b  74       MOV        RSI ,qword ptr [RSP  + 0x58 ]
                 24  58
       1800f8224 48  83  c4       ADD        RSP ,0x40
                 40
       1800f8228 5f              POP        RDI
       1800f8229 c3              RET
                             LAB_1800f822a                                   XREF[2]:     1800f81ac (j) ,  1800f8214 (j)   
       1800f822a 4c  8b  c7       MOV        R8 ,RDI
       1800f822d 48  8b  d6       MOV        RDX ,RSI
       1800f8230 49  8b  cb       MOV        RCX ,R11
       1800f8233 e8  4c  00       CALL       JIT_VirtualFunctionPointer_Framed                void * JIT_VirtualFunctionPointe
                 00  00
       1800f8238 eb  e0           JMP        LAB_1800f821a

@trylek
Copy link
Member

trylek commented Oct 8, 2021

From my selfish perspective the good news is that this regression apparently hasn't been triggered by my recent test build script cleanup change ;-). Other than that, these days I spend hours poring over iDNA analyses of CoreCLR runtime behavior and I'm trying to use them to better understand CoreCLR runtime startup costs; from that perspective these are certainly super-interesting pointers, thanks Drew! And have a great weekend too, by the way.

Tomas

@jeffhandley
Copy link
Member

Thanks, @DrewScoggins!

@trylek -- does this give you any concern for 6.0 GA, or can we go ahead and move this to the 7.0.0 milestone for further investigation?

@trylek
Copy link
Member

trylek commented Oct 8, 2021

Frankly speaking I don't think we have too much of a wiggle room. Naturally putting new features in incurs some perf penalties and making code paths more complex in the CoreCLR runtime always gnaws at our perf, I recall David Wrighton had to pull off a couple tricks to compensate for the runtime slowdown due to the newly introduced static virtual methods. The results of Drew's investigation seem to point at subtle changes in JIT behavior so I guess someone like @AndyAyersMS should chime in as to what is expected vs. unexpected; my personal belief is that .NET Core 6 pulled off a bunch of cool tricks to boost perf in spite of getting more complex and the last time I heard that was also reflected in various benchmarks and 3rd party developer findings so from that perspective I believe it's reasonable to postpone completion of the perf regression investigation to .NET 7.

@jeffhandley
Copy link
Member

Thanks! I'll move this to 7.0.0 and put it into the codegen area for further triage/investigation. If this is a recognized but acceptable trade-off, that's perfectly fine.

@jeffhandley jeffhandley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-Infrastructure labels Oct 8, 2021
@ghost
Copy link

ghost commented Oct 8, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

It seems to be affecting only Windows (-10% on average). Was detected in DrewScoggins/performance-2#3245 but not reported in runtime repo (cc @DrewScoggins).

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fMicrosoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger%3a%20False%2c%20CaptureScopes%3a%20True).html

image

Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger: False, CaptureScopes: True)

Result Ratio Operating System
Slower 0.83 Windows 10.0.19043.1165
Slower 0.85 Windows 10.0.20348
Same 0.91 Windows 10.0.20348
Slower 0.87 Windows 10.0.18363.1621
Slower 0.87 Windows 8.1
Slower 0.86 Windows 10.0.19042.685
Slower 0.83 Windows 10.0.19043.1165
Slower 0.84 Windows 10.0.22454
Slower 0.81 Windows 10.0.22451
Slower 0.81 Windows 10.0.19042.1165
Slower 0.52 Windows 7 SP1
Same 0.92 centos 8
Slower 0.85 debian 10
Same 0.90 rhel 7
Slower 0.87 sles 15
Same 0.92 opensuse-leap 15.3
Same 0.99 ubuntu 18.04
Same 0.97 ubuntu 18.04
Same 1.04 alpine 3.13
Same 0.99 ubuntu 16.04
Same 1.03 Windows 10.0.19043.1165
Faster 1.14 Windows 10.0.22000
Same 1.03 Windows 10.0.19043.1165
Same 0.90 Windows 10.0.18363.1621
Same 0.91 Windows 10.0.19043.1165
Same 1.02 macOS Big Sur 11.5.2
Same 0.97 macOS Big Sur 11.5.2
Same 0.98 macOS Big Sur 11.4
Author: adamsitnik
Assignees: adamsitnik
Labels:

os-windows, tenet-performance, area-CodeGen-coreclr, needs further triage

Milestone: 6.0.0

@jeffhandley jeffhandley modified the milestones: 6.0.0, 7.0.0 Oct 8, 2021
@trylek
Copy link
Member

trylek commented Oct 8, 2021

That matches my understanding unless Andy Ayers notices something unexpected w.r.t. JIT behavior.

@DrewScoggins
Copy link
Member

That matches my understanding unless Andy Ayers notices something unexpected w.r.t. JIT behavior.

The only thing I would add I this change is in native code (coreclr.dll). I am a little suspicious that a change that looks like different codegen from the native compiler happened just after a big orchestrator change.

I assume it was checked during the change, but the flags passed to the native compiler didn't change, right?

@adamsitnik adamsitnik removed their assignment Oct 11, 2021
@DrewScoggins
Copy link
Member

DrewScoggins commented Oct 11, 2021

I did a little more digging this morning. If you look at the options that we are using for doing the compile for coreclr.dll they are different from before and after this change.

The key difference is that we used to use these opt flags /O1 /Ob2 /Oi /Oy- and we now use /Ox /Ob2 /Oi /Oy-. Looking at #41897 and the discussion that happened there, it does not look like this was intended, but I am not sure. @jkoritzinsky can you chime in here?

Baseline:

ClCompile:
         C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\bin\HostX64\x64\CL.exe /c
/ID:\git\runtime\src\coreclr\src\pal\prebuilt\inc /ID:\git\runtime\src\coreclr\..\..\artifacts\obj\coreclr
/ID:\git\runtime\src\coreclr\src\inc /ID:\git\runtime\src\coreclr\src\debug\inc /ID:\git\runtime\src\coreclr\src\debug\inc\amd64
/ID:\git\runtime\src\coreclr\src\debug\inc\dump /ID:\git\runtime\src\coreclr\src\md\inc
/ID:\git\runtime\src\coreclr\src\classlibnative\bcltype /ID:\git\runtime\src\coreclr\src\classlibnative\cryptography
/ID:\git\runtime\src\coreclr\src\classlibnative\inc /ID:\git\runtime\artifacts\obj\coreclr\Windows_NT.x64.Release\src\inc
/ID:\git\runtime\artifacts\obj\coreclr\Windows_NT.x64.Release\src\inc\etw /ID:\git\runtime\src\coreclr\src\dlls\mscoree\..\..\inc
/Zi /nologo /W3 /WX /diagnostics:column /MP /O1 /Ob2 /Oi /Oy- /GL /D _WINDLL /D _UNICODE /D UNICODE /D WIN32 /D
_WINDOWS /D NDEBUG /D URTBLDENV_FRIENDLY=Retail /D HOST_AMD64 /D HOST_64BIT /D HOST_WINDOWS /D
TARGET_AMD64 /D TARGET_64BIT /D TARGET_WINDOWS /D FEATURE_UTF8STRING /D _BLD_CLR /D DEBUGGING_SUPPORTED
/D PROFILING_SUPPORTED /D _WIN32 /D WINVER=0x0602 /D _WIN32_WINNT=0x0602 /D WIN32_LEAN_AND_MEAN /D
_CRT_SECURE_NO_WARNINGS /D EnC_SUPPORTED /D FEATURE_ARRAYSTUB_AS_IL /D FEATURE_MULTICASTSTUB_AS_IL /D
FEATURE_PORTABLE_SHUFFLE_THUNKS /D FEATURE_INSTANTIATINGSTUB_AS_IL /D FEATURE_CODE_VERSIONING /D
FEATURE_COLLECTIBLE_TYPES /D FEATURE_COMWRAPPERS /D FEATURE_COMINTEROP /D
FEATURE_COMINTEROP_APARTMENT_SUPPORT /D FEATURE_COMINTEROP_UNMANAGED_ACTIVATION /D
FEATURE_BASICFREEZE /D FEATURE_CORECLR /D FEATURE_CORESYSTEM /D FEATURE_DEFAULT_INTERFACES /D
FEATURE_EVENT_TRACE /D FEATURE_PERFTRACING /D FEATURE_HIJACK /D FEATURE_ICASTABLE /D
FEATURE_INTEROP_DEBUGGING /D FEATURE_ISYM_READER /D FEATURE_MANAGED_ETW /D
FEATURE_MANAGED_ETW_CHANNELS /D FEATURE_MULTICOREJIT /D FEATURE_PROFAPI_ATTACH_DETACH /D
FEATURE_READYTORUN /D FEATURE_REJIT /D FEATURE_SVR_GC /D FEATURE_SYMDIFF /D FEATURE_TIERED_COMPILATION /D
FEATURE_ON_STACK_REPLACEMENT /D FEATURE_PGO /D FEATURE_TYPEEQUIVALENCE /D UNIX_AMD64_ABI_ITF /D
FEATURE_USE_ASM_GC_WRITE_BARRIERS /D FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP /D
FEATURE_MANUALLY_MANAGED_CARD_BUNDLES /D FEATURE_WIN32_REGISTRY /D _SECURE_SCL=0 /D UNICODE /D _UNICODE
/D FEATURE_DATABREAKPOINT /D FEATURE_EH_FUNCLETS /D FEATURE_STANDALONE_GC /D
FX_VER_INTERNALNAME_STR=CoreCLR.dll /D "CMAKE_INTDIR=\"Release\"" /D coreclr_EXPORTS /Gm- /EHa /MT /Zp8 /GS
/guard:cf /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR- /Fo"coreclr.dir\Release\\" /Fd"coreclr.dir\Release\vc142.pdb"
/Gd /TP /wd4960 /wd4961 /wd4603 /wd4627 /wd4838 /wd4456 /wd4457 /wd4458 /wd4459 /wd4091 /wd4291 /wd5105 /FC /Zl
/errorReport:queue /we4640 /we4007 /we4013 /we4102 /we4551 /we4700 /we4806  /Zm200 /Zc:strictStrings /w34092 /w34121
/w34125 /w34130 /w34132 /w34212 /w34530 /w35038 /w44177 /ZH:SHA_256 /source-charset:utf-8
D:\git\runtime\src\coreclr\src\dlls\mscoree\mscoree.cpp D:\git\runtime\src\coreclr\src\dlls\mscoree\unixinterface.cpp
D:\git\runtime\src\coreclr\src\dlls\mscoree\delayloadhook.cpp

Compare

ClCompile:
         C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\bin\HostX64\x64\CL.exe /c
/ID:\git\runtime\src\coreclr\src\pal\prebuilt\inc /ID:\git\runtime\src\coreclr\..\..\artifacts\obj\coreclr
/ID:\git\runtime\src\coreclr\src\inc /ID:\git\runtime\src\coreclr\src\debug\inc /ID:\git\runtime\src\coreclr\src\debug\inc\amd64
/ID:\git\runtime\src\coreclr\src\debug\inc\dump /ID:\git\runtime\src\coreclr\src\md\inc
/ID:\git\runtime\src\coreclr\src\classlibnative\bcltype /ID:\git\runtime\src\coreclr\src\classlibnative\cryptography
/ID:\git\runtime\src\coreclr\src\classlibnative\inc /ID:\git\runtime\artifacts\obj\coreclr\Windows_NT.x64.Release\src\inc
/ID:\git\runtime\artifacts\obj\coreclr\Windows_NT.x64.Release\src\inc\etw /ID:\git\runtime\src\coreclr\src\dlls\mscoree\..\..\inc
/Zi /nologo /W3 /WX /diagnostics:column /MP /Ox /Ob2 /Oi /Oy- /GL /D _WINDLL /D _UNICODE /D UNICODE /D WIN32 /D
_WINDOWS /D NDEBUG /D URTBLDENV_FRIENDLY=Retail /D HOST_AMD64 /D HOST_64BIT /D HOST_WINDOWS /D
TARGET_AMD64 /D TARGET_64BIT /D TARGET_WINDOWS /D FEATURE_UTF8STRING /D _BLD_CLR /D DEBUGGING_SUPPORTED
/D PROFILING_SUPPORTED /D _WIN32 /D WINVER=0x0602 /D _WIN32_WINNT=0x0602 /D WIN32_LEAN_AND_MEAN /D
_CRT_SECURE_NO_WARNINGS /D EnC_SUPPORTED /D FEATURE_ARRAYSTUB_AS_IL /D FEATURE_MULTICASTSTUB_AS_IL /D
FEATURE_PORTABLE_SHUFFLE_THUNKS /D FEATURE_INSTANTIATINGSTUB_AS_IL /D FEATURE_CODE_VERSIONING /D
FEATURE_COLLECTIBLE_TYPES /D FEATURE_COMWRAPPERS /D FEATURE_COMINTEROP /D
FEATURE_COMINTEROP_APARTMENT_SUPPORT /D FEATURE_COMINTEROP_UNMANAGED_ACTIVATION /D
FEATURE_BASICFREEZE /D FEATURE_CORECLR /D FEATURE_CORESYSTEM /D FEATURE_DEFAULT_INTERFACES /D
FEATURE_EVENT_TRACE /D FEATURE_PERFTRACING /D FEATURE_HIJACK /D FEATURE_ICASTABLE /D
FEATURE_INTEROP_DEBUGGING /D FEATURE_ISYM_READER /D FEATURE_MANAGED_ETW /D
FEATURE_MANAGED_ETW_CHANNELS /D FEATURE_MULTICOREJIT /D FEATURE_PROFAPI_ATTACH_DETACH /D
FEATURE_READYTORUN /D FEATURE_REJIT /D FEATURE_SVR_GC /D FEATURE_SYMDIFF /D FEATURE_TIERED_COMPILATION /D
FEATURE_ON_STACK_REPLACEMENT /D FEATURE_PGO /D FEATURE_TYPEEQUIVALENCE /D UNIX_AMD64_ABI_ITF /D
FEATURE_USE_ASM_GC_WRITE_BARRIERS /D FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP /D
FEATURE_MANUALLY_MANAGED_CARD_BUNDLES /D FEATURE_WIN32_REGISTRY /D _SECURE_SCL=0 /D UNICODE /D _UNICODE
/D FEATURE_DATABREAKPOINT /D FEATURE_EH_FUNCLETS /D FEATURE_STANDALONE_GC /D
FX_VER_INTERNALNAME_STR=CoreCLR.dll /D "CMAKE_INTDIR=\"Release\"" /D coreclr_EXPORTS /Gm- /EHa /MT /Zp8 /GS
/guard:cf /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR- /Fo"coreclr.dir\Release\\" /Fd"coreclr.dir\Release\vc142.pdb"
/Gd /TP /wd4960 /wd4961 /wd4603 /wd4627 /wd4838 /wd4456 /wd4457 /wd4458 /wd4459 /wd4091 /wd4291 /wd5105 /FC /Zl
/errorReport:queue /we4640 /we4007 /we4013 /we4102 /we4551 /we4700 /we4806  /Zm200 /Zc:strictStrings /w34092 /w34121
/w34125 /w34130 /w34132 /w34212 /w34530 /w35038 /w44177 /ZH:SHA_256 /source-charset:utf-8
D:\git\runtime\src\coreclr\src\dlls\mscoree\mscoree.cpp D:\git\runtime\src\coreclr\src\dlls\mscoree\unixinterface.cpp
D:\git\runtime\src\coreclr\src\dlls\mscoree\delayloadhook.cpp

@jkoritzinsky
Copy link
Member

That change looks to be unintentional.

IIRC, there were multiple conflicting optimization flags in the old build (/Ox was specified by CMake, /O1 by our build). The CMake generator for Ninja would put all of the optimization flags on the command line (including the conflicting ones), but the CMake generator for MSBuild looks to always pick the last specified one. Looks like when I was resolving this, I accidentally picked the wrong setting (what the command line from the Ninja and NMake generators would use instead of what the MSBuild generator used).

@AndyAyersMS
Copy link
Member

The results of Drew's investigation seem to point at subtle changes in JIT behavior

Per Drew's further analysis this looks like a change in the way a native part of the windows runtime was optimized, either directly from the flags change above, or indirectly via the impact that change in conjunction with PGO.

@jkoritzinsky does this change impact the entire set of runtime files?

Per this doc and subsequent, /O1 is equivalent to /Og /Os /Oy /Ob2 /GF /Gy and /Ox to /Ob2 /Oi /Ot /Oy.

So the main difference (ignoring /Og for now, which seems to be dead) is /Os vs /Ot, /Oi, and the extra /G* options.

Normally /Os (opt for size) vs /Ot (opt for speed) should not matter that much if you're also using PGO, as PGO will make its own speed/size determination. But there could be subtleties here as both these are "bundle options" and we later on tweak some of the things set by the bundles.

We explicitly set /Oi (intrinsic opts) after these so it should be the same either way. And likewise we explicitly turn on /Gy (function level linking).

So perhaps we lost /GF -- string pooling. This does not seem like it would explain the diff above.

At any rate I don't have an good explanation for this, but it seems like we ought to look into changing the opt options back to how it was.

@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jun 10, 2022
@jkoritzinsky
Copy link
Member

This setting does impact the entire set of runtime files.

I tried changing the options back and doing a local run and I didn't see an appreciable performance improvement across the board. I saw a few of these benchmarks get better and a few get worse (some significantly, like 20% worse). I don't have enough confidence that reverting the optimization change will substantially improve the benchmarks without causing issues in others.

Here's my results:

main:

Method HasISupportLoggingScopeLogger CaptureScopes Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
FilteredByLevel False False 12.89 ns 0.172 ns 0.152 ns 12.88 ns 12.70 ns 13.15 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False False 1,435.48 ns 10.315 ns 9.144 ns 1,436.77 ns 1,417.73 ns 1,449.82 ns 111.38 1.33 - - - -
NotFiltered False False 13,977.78 ns 1,574.853 ns 1,685.075 ns 13,600.00 ns 11,500.00 ns 17,100.00 ns 1,096.13 146.02 - - - 96 B
NotFiltered_InsideScope False False 16,800.00 ns 3,121.156 ns 3,339.602 ns 16,450.00 ns 12,600.00 ns 24,000.00 ns 1,336.98 284.93 - - - 96 B
FilteredByLevel False True 12.89 ns 0.100 ns 0.088 ns 12.90 ns 12.74 ns 13.04 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False True 2,868.11 ns 92.471 ns 106.490 ns 2,820.34 ns 2,774.83 ns 3,090.93 ns 223.36 8.99 - - - -
NotFiltered False True 13,833.33 ns 947.692 ns 1,014.019 ns 14,050.00 ns 11,800.00 ns 15,200.00 ns 1,072.48 86.74 - - - 384 B
NotFiltered_InsideScope False True 17,633.33 ns 2,387.268 ns 2,554.350 ns 18,500.00 ns 12,400.00 ns 21,400.00 ns 1,340.50 212.74 - - - 384 B
FilteredByLevel True False 12.37 ns 0.062 ns 0.055 ns 12.35 ns 12.27 ns 12.48 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True False 1,406.04 ns 16.370 ns 14.512 ns 1,401.81 ns 1,390.54 ns 1,441.66 ns 113.71 1.17 - - - -
NotFiltered True False 14,888.89 ns 3,088.538 ns 3,304.701 ns 14,100.00 ns 11,000.00 ns 22,000.00 ns 1,230.75 297.83 - - - 384 B
NotFiltered_InsideScope True False 15,505.56 ns 1,351.109 ns 1,445.672 ns 15,650.00 ns 12,900.00 ns 18,200.00 ns 1,267.84 117.57 - - - 384 B
FilteredByLevel True True 14.08 ns 0.573 ns 0.660 ns 13.86 ns 13.28 ns 15.51 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True True 16,838.24 ns 332.393 ns 341.343 ns 16,690.68 ns 16,477.62 ns 17,473.89 ns 1,202.93 60.30 - - - 120 B
NotFiltered True True 14,000.00 ns 1,782.794 ns 1,907.570 ns 13,900.00 ns 10,900.00 ns 17,200.00 ns 995.15 146.26 - - - 384 B
NotFiltered_InsideScope True True 56,922.22 ns 4,698.774 ns 5,027.636 ns 56,850.00 ns 48,900.00 ns 66,500.00 ns 4,050.30 456.65 - - - 504 B

Using /O2 across the board.

Method HasISupportLoggingScopeLogger CaptureScopes Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
FilteredByLevel False False 12.93 ns 0.165 ns 0.147 ns 12.89 ns 12.76 ns 13.19 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False False 1,409.52 ns 13.509 ns 12.636 ns 1,410.69 ns 1,391.21 ns 1,433.44 ns 108.99 1.69 - - - -
NotFiltered False False 13,888.24 ns 1,722.752 ns 1,769.139 ns 13,300.00 ns 10,700.00 ns 17,500.00 ns 1,070.69 137.79 - - - 384 B
NotFiltered_InsideScope False False 16,425.00 ns 1,479.476 ns 1,453.043 ns 17,100.00 ns 13,900.00 ns 19,200.00 ns 1,269.06 115.05 - - - 384 B
FilteredByLevel False True 12.14 ns 0.086 ns 0.071 ns 12.12 ns 12.05 ns 12.30 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False True 2,850.19 ns 39.585 ns 37.028 ns 2,837.73 ns 2,799.48 ns 2,934.18 ns 235.17 3.60 - - - -
NotFiltered False True 13,894.74 ns 1,744.423 ns 1,938.921 ns 14,100.00 ns 10,600.00 ns 17,800.00 ns 1,132.08 182.99 - - - 384 B
NotFiltered_InsideScope False True 18,627.78 ns 1,743.102 ns 1,865.099 ns 18,000.00 ns 16,100.00 ns 23,000.00 ns 1,535.65 169.00 - - - 672 B
FilteredByLevel True False 11.56 ns 0.058 ns 0.051 ns 11.53 ns 11.50 ns 11.67 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True False 1,410.14 ns 11.084 ns 9.825 ns 1,410.25 ns 1,392.11 ns 1,427.61 ns 122.04 1.16 - - - -
NotFiltered True False 14,227.78 ns 1,327.065 ns 1,419.944 ns 14,150.00 ns 11,600.00 ns 16,700.00 ns 1,223.73 116.98 - - - 384 B
NotFiltered_InsideScope True False 16,663.16 ns 2,706.251 ns 3,007.991 ns 16,800.00 ns 12,000.00 ns 23,000.00 ns 1,392.56 257.14 - - - 384 B
FilteredByLevel True True 12.59 ns 0.124 ns 0.110 ns 12.59 ns 12.43 ns 12.80 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True True 16,971.99 ns 491.024 ns 525.390 ns 16,848.39 ns 16,404.33 ns 18,168.06 ns 1,356.26 37.07 - - - 120 B
NotFiltered True True 14,044.44 ns 1,285.936 ns 1,375.937 ns 14,400.00 ns 10,800.00 ns 16,200.00 ns 1,114.45 111.85 - - - 384 B
NotFiltered_InsideScope True True 62,122.22 ns 6,162.752 ns 6,594.075 ns 60,750.00 ns 52,100.00 ns 78,600.00 ns 4,923.45 578.02 - - - 504 B

Using /O1 across the board.

Method HasISupportLoggingScopeLogger CaptureScopes Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
FilteredByLevel False False 12.81 ns 0.090 ns 0.080 ns 12.79 ns 12.67 ns 12.95 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False False 1,418.64 ns 9.310 ns 7.775 ns 1,418.64 ns 1,407.11 ns 1,435.51 ns 110.72 1.02 - - - -
NotFiltered False False 13,827.78 ns 2,919.012 ns 3,123.310 ns 12,750.00 ns 10,200.00 ns 21,700.00 ns 1,069.83 245.67 - - - 384 B
NotFiltered_InsideScope False False 16,173.68 ns 1,839.376 ns 2,044.462 ns 16,000.00 ns 13,300.00 ns 21,800.00 ns 1,221.86 132.57 - - - 384 B
FilteredByLevel False True 12.82 ns 0.129 ns 0.115 ns 12.78 ns 12.70 ns 13.04 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope False True 2,809.46 ns 20.251 ns 18.943 ns 2,808.26 ns 2,772.07 ns 2,837.51 ns 219.32 1.75 - - - -
NotFiltered False True 13,921.05 ns 1,531.401 ns 1,702.149 ns 14,100.00 ns 11,400.00 ns 17,100.00 ns 1,081.61 141.83 - - - 384 B
NotFiltered_InsideScope False True 18,047.06 ns 2,218.454 ns 2,278.189 ns 17,100.00 ns 15,100.00 ns 21,600.00 ns 1,417.68 166.08 - - - 384 B
FilteredByLevel True False 12.86 ns 0.170 ns 0.159 ns 12.84 ns 12.65 ns 13.22 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True False 1,515.62 ns 82.576 ns 95.095 ns 1,510.77 ns 1,393.06 ns 1,684.38 ns 119.34 7.47 - - - -
NotFiltered True False 13,383.33 ns 2,101.634 ns 2,248.725 ns 13,700.00 ns 9,400.00 ns 17,800.00 ns 1,025.76 182.43 - - - 384 B
NotFiltered_InsideScope True False 15,433.33 ns 2,106.701 ns 2,254.147 ns 15,050.00 ns 12,200.00 ns 20,100.00 ns 1,165.73 156.40 - - - 96 B
FilteredByLevel True True 12.24 ns 0.161 ns 0.150 ns 12.25 ns 11.99 ns 12.55 ns 1.00 0.00 - - - -
FilteredByLevel_InsideScope True True 16,779.27 ns 363.143 ns 388.559 ns 16,659.58 ns 16,425.86 ns 17,829.21 ns 1,374.81 32.18 - - - 120 B
NotFiltered True True 14,015.79 ns 1,539.217 ns 1,710.836 ns 13,800.00 ns 11,500.00 ns 17,600.00 ns 1,125.32 126.53 - - - 96 B
NotFiltered_InsideScope True True 64,489.47 ns 10,687.735 ns 11,879.389 ns 59,300.00 ns 50,000.00 ns 89,200.00 ns 5,433.07 1,031.12 - - - 504 B

@jkoritzinsky
Copy link
Member

@AndyAyersMS any suggestions on next steps here?

@AndyAyersMS
Copy link
Member

Recent perf history shows we've regained most of the missing perf (at least on this one test):
newplot - 2022-06-10T150844 894

So if it was me I'd just close this.

FWIW the big drop in March 2022 seems to have come from #65926.

@jkoritzinsky
Copy link
Member

I'll close this then.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

9 participants