Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlServer.Types regressed on .NET 7 #75455

Closed
jnyrup opened this issue Sep 12, 2022 · 16 comments
Closed

SqlServer.Types regressed on .NET 7 #75455

jnyrup opened this issue Sep 12, 2022 · 16 comments
Assignees
Labels
area-VM-coreclr tenet-performance Performance related issue
Milestone

Comments

@jnyrup
Copy link
Contributor

jnyrup commented Sep 12, 2022

Description

When trying out .NET 7 Preview 7 the runtime of some our tests approximately increased by ~100% from 0.9s from 1.8s compared to .NET 6.0.8.

I traced the majority of the regression to calls to Microsoft.SqlServer.Types and bisected the regression to be introduced in .NET 7 Preview 5.

To reproduce download STIntersection.zip and run

dotnet run -c release --framework net7.0

Configuration

Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
GC=Concurrent Workstation

See each detail section below for the version of .NET used in the different benchmarks.

Regression?

Yes.
.NET 7 Preview 4 has same performance as .NET 6.0.8.
.NET 7 Preview 5 is 50% slower than .NET 6.0.8

Data

.NET 7 Preview 4
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.4.22252.9
  [Host]   : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev Ratio Code Size
STIntersection .NET 6.0 .NET 6.0 72.61 us 0.795 us 0.744 us 1.00 871 B
STIntersection .NET 7.0 .NET 7.0 72.86 us 0.177 us 0.148 us 1.00 754 B
.NET 7 Preview 5
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host]   : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev Ratio RatioSD Code Size
STIntersection .NET 6.0 .NET 6.0 72.02 us 0.865 us 0.809 us 1.00 0.00 871 B
STIntersection .NET 7.0 .NET 7.0 108.81 us 1.078 us 0.956 us 1.51 0.03 756 B
.NET 7 Preview 7
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]   : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev Ratio RatioSD Code Size
STIntersection .NET 6.0 .NET 6.0 71.49 us 0.638 us 0.533 us 1.00 0.00 871 B
STIntersection .NET 7.0 .NET 7.0 110.30 us 1.366 us 1.277 us 1.54 0.02 662 B
.NET 7 RC 1
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2006/21H2/November2021Update)
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22431.12
  [Host]   : .NET 7.0.0 (7.0.22.42610), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.0 (7.0.22.42610), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev Ratio RatioSD
STIntersection .NET 6.0 .NET 6.0 69.30 us 0.715 us 0.669 us 1.00 0.00
STIntersection .NET 7.0 .NET 7.0 106.24 us 0.690 us 0.646 us 1.53 0.02

In this stack trace I notice that the module of dynamicClass.IL\_STUB\_PInvoke is system.net.sockets.il 🤨

Partial call stack from VS
Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category
 + Microsoft.SqlServer.Types.SqlGeography.STIntersection(Microsoft.SqlServer.Types.SqlGeography) 22742 (80,75 %) 13 (0,05 %) microsoft.sqlserver.types IO | Kernel | Runtime
| + Microsoft.SqlServer.Types.GLNativeMethods.GeodeticCombine(CombineMode, Microsoft.SqlServer.Types.GeoData, Microsoft.SqlServer.Types.GeoData, float64) 22636 (80,38 %) 7 (0,02 %) microsoft.sqlserver.types IO | Kernel | Runtime
|| + dynamicClass.IL_STUB_PInvoke(00007FFAD0D6A418, 00007FFAD0A1E260, 00007FFAD0A1E260, float64, 00007FFAD0D6A668) 22395 (79,52 %) 20 (0,07 %) system.net.sockets.il IO | Kernel
||| + sqlserverspatial140.dll!0x00007ffadee891fb 9243 (32,82 %) 2 (0,01 %) sqlserverspatial140 IO | Kernel
|||| + sqlserverspatial140.dll!0x00007ffadee99511 5837 (20,73 %) 3 (0,01 %) sqlserverspatial140 IO | Kernel
||||| + sqlserverspatial140.dll!0x00007ffadee7c465 4658 (16,54 %) 2 (0,01 %) sqlserverspatial140 IO | Kernel
|||||| + sqlserverspatial140.dll!0x00007ffadeed358c 4436 (15,75 %) 0 (0,00 %) sqlserverspatial140 IO | Kernel
||||||| + sqlserverspatial140.dll!0x00007ffadee97f28 4399 (15,62 %) 0 (0,00 %) sqlserverspatial140 IO | Kernel
|||||||| + coreclr.dll!0x00007ffb2f0fd41e 4223 (15,00 %) 0 (0,00 %) coreclr IO | Kernel
||||||||| + coreclr.dll!0x00007ffb2f07dd58 3487 (12,38 %) 0 (0,00 %) coreclr IO | Kernel
|||||||||| + coreclr.dll!0x00007ffb2f07ddce 3124 (11,09 %) 0 (0,00 %) coreclr IO | Kernel
||||||||||| + coreclr.dll!0x00007ffb2efd2509 1974 (7,01 %) 1 (0,00 %) coreclr IO | Kernel
|||||||||||| + MapViewOfFile 1972 (7,00 %) 0 (0,00 %) kernelbase Kernel
||||||||||||| + MapViewOfFileExNuma 1972 (7,00 %) 4 (0,01 %) kernelbase Kernel
|||||||||||||| + NtMapViewOfSection 1968 (6,99 %) 41 (0,15 %) ntdll Kernel
||||||||||||||| + 0xfffff8036e40a2b5 1885 (6,69 %) 0 (0,00 %) [Unknown Code]
|||||||||||||||| + 0xfffff8036e5e4829 1850 (6,57 %) 0 (0,00 %) [Unknown Code]
||||||||||||||||| + 0xfffff8036e5e7ccc 1841 (6,54 %) 0 (0,00 %) [Unknown Code]
|||||||||||||||||| + 0xfffff8036e5e837e 1553 (5,51 %) 0 (0,00 %) [Unknown Code]
||||||||||||||||||| + 0xfffff8036e5e9fc4 1552 (5,51 %) 0 (0,00 %) [Unknown Code]
|||||||||||||||||||| + 0xfffff8036e5ea160 1552 (5,51 %) 0 (0,00 %) [Unknown Code]
||||||||||||||||||||| + 0xfffff8036e252060 1220 (4,33 %) 1218 (4,32 %) [Unknown Code]
||||||||||||||||||||| - 0xfffff8036e25206c 302 (1,07 %) 302 (1,07 %) [Unknown Code]
||||| - sqlserverspatial140.dll!0x00007ffadee7c450 1173 (4,17 %) 0 (0,00 %) sqlserverspatial140 Kernel
|||| - sqlserverspatial140.dll!0x00007ffadee994d8 3400 (12,07 %) 0 (0,00 %) sqlserverspatial140 IO | Kernel
||| + sqlserverspatial140.dll!0x00007ffadee89295 8091 (28,73 %) 2 (0,01 %) sqlserverspatial140 IO | Kernel
|||| + sqlserverspatial140.dll!0x00007ffadee86976 7869 (27,94 %) 0 (0,00 %) sqlserverspatial140 IO | Kernel
||||| + coreclr.dll!0x00007ffb2f0fd41e 7768 (27,58 %) 0 (0,00 %) coreclr IO | Kernel
|||||| - coreclr.dll!0x00007ffb2f07dd58 3857 (13,70 %) 0 (0,00 %) coreclr Kernel
|||||| + coreclr.dll!0x00007ffb2f07dd45 3448 (12,24 %) 0 (0,00 %) coreclr IO | Kernel
||||||| + coreclr.dll!0x00007ffb2efd2509 2838 (10,08 %) 4 (0,01 %) coreclr IO | Kernel
|||||||| + MapViewOfFile 2833 (10,06 %) 0 (0,00 %) kernelbase Kernel
||||||||| + MapViewOfFileExNuma 2833 (10,06 %) 7 (0,02 %) kernelbase Kernel
|||||||||| + NtMapViewOfSection 2826 (10,03 %) 38 (0,13 %) ntdll Kernel
||||||||||| + 0xfffff8036e40a2b5 2754 (9,78 %) 0 (0,00 %) [Unknown Code]
|||||||||||| + 0xfffff8036e5e4829 2733 (9,70 %) 0 (0,00 %) [Unknown Code]
||||||||||||| + 0xfffff8036e5e7ccc 2726 (9,68 %) 0 (0,00 %) [Unknown Code]
|||||||||||||| + 0xfffff8036e5e837e 2500 (8,88 %) 0 (0,00 %) [Unknown Code]
||||||||||||||| + 0xfffff8036e5e9fc4 2500 (8,88 %) 0 (0,00 %) [Unknown Code]
|||||||||||||||| + 0xfffff8036e5ea160 2494 (8,86 %) 0 (0,00 %) [Unknown Code]
||||||||||||||||| - 0xfffff8036e252060 1977 (7,02 %) 1976 (7,02 %) [Unknown Code]
||||||||||||||||| - 0xfffff8036e25206c 488 (1,73 %) 485 (1,72 %) [Unknown Code]
||| + dynamicClass.IL_STUB_StructMarshal(ref sbyte, sbyte*, int, ref 00007FFAD0D6AC30) 4428 (15,72 %) 19 (0,07 %) system.net.sockets.il IO | Kernel
|||| + coreclr.dll!0x00007ffb2f08a35d 4382 (15,56 %) 0 (0,00 %) coreclr IO | Kernel
||||| + coreclr.dll!0x00007ffb2f08c13f 3694 (13,12 %) 0 (0,00 %) coreclr IO | Kernel
|||||| + coreclr.dll!0x00007ffb2efd2509 3044 (10,81 %) 1 (0,00 %) coreclr Kernel
||||||| + MapViewOfFile 3043 (10,81 %) 1 (0,00 %) kernelbase Kernel
|||||||| + MapViewOfFileExNuma 3042 (10,80 %) 9 (0,03 %) kernelbase Kernel
||||||||| + NtMapViewOfSection 3033 (10,77 %) 46 (0,16 %) ntdll Kernel
|||||||||| + 0xfffff8036e40a2b5 2945 (10,46 %) 0 (0,00 %) [Unknown Code]
||||||||||| + 0xfffff8036e5e4829 2914 (10,35 %) 0 (0,00 %) [Unknown Code]
|||||||||||| + 0xfffff8036e5e7ccc 2910 (10,33 %) 0 (0,00 %) [Unknown Code]
||||||||||||| + 0xfffff8036e5e837e 2611 (9,27 %) 0 (0,00 %) [Unknown Code]
|||||||||||||| + 0xfffff8036e5e9fc4 2609 (9,26 %) 0 (0,00 %) [Unknown Code]
||||||||||||||| + 0xfffff8036e5ea160 2603 (9,24 %) 0 (0,00 %) [Unknown Code]
|||||||||||||||| + 0xfffff8036e252060 2072 (7,36 %) 2070 (7,35 %) [Unknown Code]
|||||||||||||||| - 0xfffff8036e25206c 491 (1,74 %) 490 (1,74 %) [Unknown Code]

Analysis

@jnyrup jnyrup added the tenet-performance Performance related issue label Sep 12, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 12, 2022
@MichalPetryka
Copy link
Contributor

MichalPetryka commented Sep 12, 2022

Could you try running the benchmark with DOTNET_GCName=clrgc.dll?

@jnyrup
Copy link
Contributor Author

jnyrup commented Sep 12, 2022

Using $Env:DOTNET_GCName="clrgc.dll"

.NET 7.0: The result is identical.
.NET 6.0.8: I get Failed to create CoreCLR, HRESULT: 0x80004005

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Sep 13, 2022
@mangod9 mangod9 added this to the 7.0.0 milestone Sep 13, 2022
@mangod9
Copy link
Member

mangod9 commented Sep 13, 2022

could be related to WriteXorExecute.

@jnyrup
Copy link
Contributor Author

jnyrup commented Sep 17, 2022

could be related to WriteXorExecute.

Sounds right, here are my results of disabling/enabling W^X.

$Env:COMPlus_EnableWriteXorExecute=0

Method Job Runtime Mean Error StdDev Ratio
STIntersection .NET 6.0 .NET 6.0 66.13 us 0.506 us 0.448 us 1.00
STIntersection .NET 7.0 .NET 7.0 66.40 us 0.410 us 0.320 us 1.00

$Env:COMPlus_EnableWriteXorExecute=1

Method Job Runtime Mean Error StdDev Ratio RatioSD
STIntersection .NET 6.0 .NET 6.0 101.5 us 1.39 us 1.60 us 1.00 0.00
STIntersection .NET 7.0 .NET 7.0 103.1 us 1.81 us 1.61 us 1.01 0.02

@mangod9
Copy link
Member

mangod9 commented Sep 19, 2022

@janvorli has made a fix in RC2 which might help with this.

@jnyrup
Copy link
Contributor Author

jnyrup commented Sep 21, 2022

Installing dotnet-sdk-7.0.100-rtm.22470.23-win-x64.exe (assuming the fix is in there) helped a bit.
(Please let me know if the problem is fully reproducible and these additional benchmarks are just spammy)

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2006/21H2/November2021Update)
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rtm.22470.23
  [Host]   : .NET 7.0.0 (7.0.22.46503), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.0 (7.0.22.46503), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev
STIntersection .NET 7.0 .NET 7.0 96.93 us 1.020 us 0.852 us

@janvorli
Copy link
Member

I have run it with internal W^X logging compiled in. The thing that slows it down is the number of delegates for reverse pinvoke (used for calling managed code from native code) created. During the test, 410000 such delegates for were created. That is really huge. Each such delegate created can trigger a map and unmap of writeable memory. The RTM reduces the number of mappings by caching them in a simple way, but the effectivity of the caching depends on the pattern of code generation / delegates creation. The best way to get rid of the perf regression is to stop using delegates for calling managed methods back from native code and use "UnmanagedCallersOnly" attribute marked functions or stop creating that many delegates. I guess that somewhere a new delegate is created each time the code needs to pass a managed callback to native code. Creating it once and reusing it would lower the performance hit significantly.
Besides these delegates, there is only a very small number of other read-write mappings created for executable code.

Both approaches would be beneficial anyways as they would reduce the amount of GC garbage generated too.

I guess this is in the System.Data.SqlClient, so the fix would be on the .NET team side.

@mangod9
Copy link
Member

mangod9 commented Sep 30, 2022

Hi @jnyrup, assume you have a workaround of disabling WriteXorExecute so this issue isnt blocking? We will work with the SqlClient team to see if the delegate creation can be avoided. Thx!

@mangod9 mangod9 modified the milestones: 7.0.0, 8.0.0 Sep 30, 2022
@Wraith2
Copy link
Contributor

Wraith2 commented Oct 1, 2022

I guess this is in the System.Data.SqlClient, so the fix would be on the .NET team side.

@David-Engel may have input on this.

@jnyrup
Copy link
Contributor Author

jnyrup commented Oct 1, 2022

As W^X is only strictly necessary when running on Apple Silicon (as far as I can read about W^X) and Microsoft.SqlServer.Types is x86/x64-only we'll disable WriteXorExecute for now once we upgrade to .NET 7.

Thanks to you all for the help of finding the culprit of this performance/security clash.

@janvorli
Copy link
Member

janvorli commented Oct 3, 2022

@jnyrup right, Apple Silicon has W^X always enabled. FYI, it uses a completely different mechanism (built in the CPU) to achieve that than Windows / Linux - there is no double mapping and thus no perf hit caused by that.

@David-Engel
Copy link
Contributor

@jnyrup System.Data.SqlClient is in maintenance mode and won't be receiving anything other then security/critical updates. I suggest migrating to Microsoft.Data.SqlClient where we can make changes for perf improvements, if needed.

Regards,
David

@jnyrup
Copy link
Contributor Author

jnyrup commented Oct 3, 2022

@janvorli Thanks for the valuable insights 👍

@David-Engel I don't think Microsoft.SqlServer.Types lives in either of SDS or MDS, according to nuget the package is owned by some "SQL Server Team".

For now I've sent the package owners a message through the nuget.org about this thread.

@mangod9
Copy link
Member

mangod9 commented Jul 12, 2023

Just checking if this is something we need to keep open?

@mangod9
Copy link
Member

mangod9 commented Jul 29, 2023

Closing since this is not actionable on the runtime side.

@mangod9 mangod9 closed this as completed Jul 29, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants