Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibraryImportGenerator] Add/use CustomTypeMarshaller implementations for string marshalling #67635

Merged
merged 11 commits into from
Apr 14, 2022

Conversation

elinor-fung
Copy link
Member

@elinor-fung elinor-fung commented Apr 6, 2022

Contributes to #66623

  • Add AnsiStringMarshaller, Utf16StringMarshaller, and Utf8StringMarshaller
  • Switch to using those marshallers instead of directly generating code in line

Adding the ANSI string marshaller makes it so that we do not always allocate for ANSI strings (~40-50% improvement for marshalling a string by value if it fits within the buffer such that we don't need to allocate).

Ran the json/json and platform/plaintext benchmarks, copying over *.dll from Microsoft.NETCore.App in a local build.

Windows

json.benchmarks.yml --scenario json

Before:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 86                               |
| Cores usage (%)       | 1,026                            |
| Working Set (MB)      | 168                              |
| Private Memory (MB)   | 171                              |
| Build Time (ms)       | 14,992                           |
| Start Time (ms)       | 152                              |
| Published Size (KB)   | 111,908                          |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |

| load                   |                |
| ---------------------- | -------------- |
| CPU Usage (%)          | 99             |
| Cores usage (%)        | 1,187          |
| Working Set (MB)       | 49             |
| Private Memory (MB)    | 358            |
| Build Time (ms)        | 8,381          |
| Start Time (ms)        | 0              |
| Published Size (KB)    | 76,698         |
| .NET Core SDK Version  | 3.1.418        |
| ASP.NET Core Version   | 3.1.24+d1fa2cb |
| .NET Runtime Version   | 3.1.24+3b38386 |
| First Request (ms)     | 117            |
| Requests/sec           | 709,124        |
| Requests               | 10,707,682     |
| Mean latency (ms)      | 0.34           |
| Max latency (ms)       | 29.17          |
| Bad responses          | 0              |
| Socket errors          | 0              |
| Read throughput (MB/s) | 102.79         |
| Latency 50th (ms)      | 0.29           |
| Latency 75th (ms)      | 0.36           |
| Latency 90th (ms)      | 0.49           |
| Latency 99th (ms)      | 1.16           |

After:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 88                               |
| Cores usage (%)       | 1,054                            |
| Working Set (MB)      | 167                              |
| Private Memory (MB)   | 176                              |
| Build Time (ms)       | 2,406                            |
| Start Time (ms)       | 164                              |
| Published Size (KB)   | 111,908                          |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 99         |
| Cores usage (%)        | 1,188      |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 358        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 118        |
| Requests/sec           | 710,541    |
| Requests               | 10,728,845 |
| Mean latency (ms)      | 0.33       |
| Max latency (ms)       | 18.92      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 103.00     |
| Latency 50th (ms)      | 0.29       |
| Latency 75th (ms)      | 0.36       |
| Latency 90th (ms)      | 0.49       |
| Latency 99th (ms)      | 1.14       |
platform.benchmarks.yml --scenario plaintext Requests per second seemed to improve by ~2-3%.

Before:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 68                               |
| Cores usage (%)       | 813                              |
| Working Set (MB)      | 99                               |
| Private Memory (MB)   | 105                              |
| Build Time (ms)       | 1,762                            |
| Start Time (ms)       | 160                              |
| Published Size (KB)   | 95,190                           |
| .NET Core SDK Version | 7.0.100-preview.4.22212.3        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 100        |
| Cores usage (%)        | 1,200      |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 370        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 66         |
| Requests/sec           | 5,465,290  |
| Requests               | 82,512,624 |
| Mean latency (ms)      | 3.86       |
| Max latency (ms)       | 57.29      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 656.73     |
| Latency 50th (ms)      | 1.89       |
| Latency 75th (ms)      | 5.33       |
| Latency 90th (ms)      | 10.13      |
| Latency 99th (ms)      | 20.70      |

After:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 77                               |
| Cores usage (%)       | 924                              |
| Working Set (MB)      | 97                               |
| Private Memory (MB)   | 103                              |
| Build Time (ms)       | 5,255                            |
| Start Time (ms)       | 154                              |
| Published Size (KB)   | 95,190                           |
| .NET Core SDK Version | 7.0.100-preview.4.22212.3        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 100        |
| Cores usage (%)        | 1,199      |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 370        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 65         |
| Requests/sec           | 5,593,978  |
| Requests               | 84,465,296 |
| Mean latency (ms)      | 2.74       |
| Max latency (ms)       | 56.81      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 672.19     |
| Latency 50th (ms)      | 1.57       |
| Latency 75th (ms)      | 3.23       |
| Latency 90th (ms)      | 6.76       |
| Latency 99th (ms)      | 15.88      |

Linux

json.benchmarks.yml --scenario json

Before:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 96                               |
| Cores usage (%)       | 1,157                            |
| Working Set (MB)      | 254                              |
| Private Memory (MB)   | 653                              |
| Build Time (ms)       | 13,228                           |
| Start Time (ms)       | 118                              |
| Published Size (KB)   | 110,310                          |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.2+f9ed970  |

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 78         |
| Cores usage (%)        | 936        |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 358        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 107        |
| Requests/sec           | 718,543    |
| Requests               | 10,847,454 |
| Mean latency (ms)      | 0.76       |
| Max latency (ms)       | 27.64      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 104.16     |
| Latency 50th (ms)      | 0.23       |
| Latency 75th (ms)      | 0.60       |
| Latency 90th (ms)      | 2.25       |
| Latency 99th (ms)      | 6.46       |

After:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 96                               |
| Cores usage (%)       | 1,156                            |
| Working Set (MB)      | 258                              |
| Private Memory (MB)   | 656                              |
| Build Time (ms)       | 2,118                            |
| Start Time (ms)       | 123                              |
| Published Size (KB)   | 110,310                          |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.2+f9ed970  |

| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 79         |
| Cores usage (%)        | 946        |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 358        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 101        |
| Requests/sec           | 721,153    |
| Requests               | 10,888,335 |
| Mean latency (ms)      | 0.82       |
| Max latency (ms)       | 54.60      |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 104.54     |
| Latency 50th (ms)      | 0.22       |
| Latency 75th (ms)      | 0.64       |
| Latency 90th (ms)      | 2.43       |
| Latency 99th (ms)      | 7.11       |
platform.benchmarks.yml --scenario plaintext

Before:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 90                               |
| Cores usage (%)       | 1,081                            |
| Working Set (MB)      | 186                              |
| Private Memory (MB)   | 544                              |
| Build Time (ms)       | 6,035                            |
| Start Time (ms)       | 127                              |
| Published Size (KB)   | 94,593                           |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 97         |
| Cores usage (%)        | 1,162      |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 370        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 58         |
| Requests/sec           | 6,503,930  |
| Requests               | 98,148,352 |
| Mean latency (ms)      | 6.06       |
| Max latency (ms)       | 113.73     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 781.53     |
| Latency 50th (ms)      | 2.58       |
| Latency 75th (ms)      | 8.02       |
| Latency 90th (ms)      | 16.20      |
| Latency 99th (ms)      | 40.56      |

After:

| application           |                                  |
| --------------------- | -------------------------------- |
| CPU Usage (%)         | 93                               |
| Cores usage (%)       | 1,120                            |
| Working Set (MB)      | 184                              |
| Private Memory (MB)   | 517                              |
| Build Time (ms)       | 1,487                            |
| Start Time (ms)       | 126                              |
| Published Size (KB)   | 94,593                           |
| .NET Core SDK Version | 7.0.100-preview.4.22212.1        |
| ASP.NET Core Version  | 7.0.0-preview.4.22211.18+c00e0e7 |
| .NET Runtime Version  | 7.0.0-preview.4.22212.3+9753980  |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 96         |
| Cores usage (%)        | 1,152      |
| Working Set (MB)       | 38         |
| Private Memory (MB)    | 370        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 57         |
| Requests/sec           | 6,510,292  |
| Requests               | 98,309,080 |
| Mean latency (ms)      | 5.27       |
| Max latency (ms)       | 157.37     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 782.30     |
| Latency 50th (ms)      | 2.26       |
| Latency 75th (ms)      | 6.66       |
| Latency 90th (ms)      | 13.48      |
| Latency 99th (ms)      | 38.45      |

@elinor-fung elinor-fung added area-System.Runtime.InteropServices source-generator Indicates an issue with a source generator feature labels Apr 6, 2022
@ghost ghost assigned elinor-fung Apr 6, 2022
@ghost
Copy link

ghost commented Apr 6, 2022

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

Issue Details

We still need #66623 to go through API review. This change has the string marshallers as they are currently proposed.

  • Add AnsiStringMarshaller, Utf16StringMarshaller, and Utf8StringMarshaller
  • Switch to using those marshallers instead of directly generating code in line

Adding the ANSI string marshaller makes it so that we do not always allocate for ANSI strings (~40-50% improvement for marshalling a string by value if it fits within the buffer such that we don't need to allocate).

This also makes UnmanagedType.LPStr only mean ANSI (which is how we have it documented). I think this is in line with our decision to not do CharSet.Ansi/CharSet.Auto and desire to avoid the weird/special behaviours in the built-in system.

Author: elinor-fung
Assignees: -
Labels:

area-System.Runtime.InteropServices, source-generator

Milestone: -

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

// + 1 for number of characters in case left over high surrogate is ?
// * <MaxByteCountPerChar> (3 for UTF-8)
// +1 for null terminator
if (buffer.Length >= (str.Length + 1) * MaxByteCountPerChar + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to use Encoding.UTF8.GetMaxByteCount(str)? What happens if this calculation overflows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it used GetMaxByteCount, it seemed to add a bit of overhead (~5% for our tests that were just marshalling a string by value), so I made it match the built-in system that just does the calculation like this. I didn't try it in the macro/asp.net tests, so maybe it doesn't really show up/matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringToCoTaskMemUTF8 is going to call GetMaxByteCount anyway. We can inline the logic from StringToCoTaskMemUTF8 here and check again after calling GetMaxByteCount whether the stack buffer is big enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this need to allocate the memory using CoTaskMemAlloc? If it is possible, it would be a tiny bit faster to use NativeMemory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For just marshalling by value, it would be possible. For ref, we'd still want CoTaskMemAlloc, since the native side could be expecting that (could transfer ownership / free / replace), so we'd have allocate differently and free accordingly for by value versus ref.

@elinor-fung elinor-fung marked this pull request as ready for review April 12, 2022 20:34
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…erop.SourceGeneration/MarshallingAttributeInfo.cs

Co-authored-by: Aaron Robinson <arobins@microsoft.com>
@elinor-fung elinor-fung merged commit f502039 into dotnet:main Apr 14, 2022
@elinor-fung elinor-fung deleted the stringMarshallers branch April 14, 2022 23:33
@AndyAyersMS
Copy link
Member

Potential improvement seen in dotnet/perf-autofiling-issues#4754

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants