-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DllImportGenerator] Update GeneratedDllImportAttribute
handling of character set / encoding
#61326
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
GeneratedDllImportAttribute
handling of character set / encodingGeneratedDllImportAttribute
handling of character set / encoding
That is correct for the built-in system. However, we can add new values into CharSet that are only recognized by the source-generators, and ignored by the built-in system.
Why not make UTF8 the default? UTF8 encoding won the world. It should be the default.
Yes, for new APIs. Existing APIs are stuck on the ANSI/Unicode plan. Do we have any data on how many Windows APIs are ANSI-only and thus the ANSI support is required to invoke them? What is going to be the encoding property that maps to It would be useful to enumerate encodings that we expect people are typically going to use and what the suggested replacement is going to be. Here is my list:
|
This would be good to confirm, but my gut tells me it is either stable at a small number or declining. Based on this assumption, I'd argue for using the The list that @jkotas provided could also be used to either dictate a new public APIs using |
The thought was that it would be confusing, since I think the natural reaction to a new value on
Machine endian ones seem like they might be odd on |
Agree. The properties on Encoding type are focused on wire formats. Interop marshaling is about in-memory formats and so the common set is going to be different. UTF8 seems to be the only one that is in both sets.
How do we decide when these things should be magic constants vs. proper types? It makes me wonder how custom encodings would work. For example, I would like to write something like |
The " |
@svick It is definitely complicated, which is one of the reasons the One of the mitigations to the complexity will be a series of predefined types that will handle what is desired. Implementing your own would require care but we will have public APIs that handle the most common cases. An example could be something like the following: namespace System.Runtime.InteropServices.Encoding
{
// Selects the default encoding for the current platform
struct AutoPlatformEncoding { ... }
// Selects the UTF-16 endianness based on the current platform
struct AutoUtf16Encoding { ... }
...
} A series of these types would be provided by .NET. Then users could define their own or use one of the built-in ones. [GeneratedDllImport(..., Encoding = typeof(AutoPlatformEncoding))]
public static partial int Len(string s); Another approach would be to have built-in subclasses of the |
One of the initial reasons we gravitated towards the string over a type was the thought that we could just use the static properties on In the example @AaronRobinsonMSFT has above, I would expect those pre-defined types would be the same as what one could use with the |
Update the description to reflect the |
Random thoughts:
|
I do like that it matches the
Maybe
Definitely fair. If I recall from when we were looking at all our own p/invokes, the host ones were the only real uses and everything else that specified it was only used on one platform and didn't need auto.
My aversion to a default was around not wanting to match the (confusing) built-in one and wanting a clear error if the char set would end up different when switching an existing p/invoke to generated. This was influenced by our (original) mindset of trying to make |
I think we can all agree that UTF-8 won—thank heavens. Making UTF-8 the default seems okay but does make getting it wrong annoying if consuming a native binary without being able to easily debug it. It also would mean we are taking sides with respect to platforms, which I'd prefer not to do in practice. We could also loosen the contract in V2. This would make the first wave transition largely avoid an annoying mistake.
I'd prefer adopting a platform agnostic focus for .NET interop. For example, the Self-serving plug: It is also against the strong guidance I recently gave at a CppCon talk around interop with C++. |
Why not use Generic Attributes to apply type constraints to the encoding property? So that the invalid encoding error can be observed at compile time instead of run time. class GeneratedDllImportAttribute<TEncoding> : Attribute where TEncoding : Encoding
{
} // Error observed at compile time - invalid encoding
[GeneratedDllImport<int>("lib")] |
@hez2010 That is a good question. The general problem with Generic Attributes is they are much more difficult to evolve without breaking changes. The above works for |
Updated the attribute proposal - #46822 - to have the |
Since it looks like we're going the route of removing |
For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.
We’d like to:
CharSet.Ansi
on Unix to get UTF-8MarshalAs
on each parameterCharSet
enumerationOur current thinking is to:
CharSet
fieldMarshalStringsUsing
field -Type
MarshalUsing
/NativeMarshalling
attributes for custom marshalling of stringsSystem.Text.Encoding
under the hood)Example:
Where:
Other considerations:
Unicode
vsUtf16
...Utf16StringMarshalling
would be correct and in line with our cross-platform focus, butUnicodeStringMarshalling
would be more consistent with existing APIs.OperatingSystem
APIs)DefaultCharSetAttribute
.ExactSpelling
: usesCharSet
to probe for entry point on Windows, doesn’t mean anything on Unix@AaronRobinsonMSFT @jkoritzinsky @jkotas @stephentoub
The text was updated successfully, but these errors were encountered: