Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve null terminating character in ref Span<char> parameters #1296

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AArnott
Copy link
Member

@AArnott AArnott commented Oct 9, 2024

Fixes #1295

@AArnott
Copy link
Member Author

AArnott commented Oct 9, 2024

@JeremyKuhne @AaronRobinsonMSFT This change seems like the right one for the original issue. But the fact that it required a test change to compensate well-describes the breaking change that it entails.
I'm looking for additional opinions on whether this is the right call to make.

@AaronRobinsonMSFT
Copy link
Member

This doesn't seem like a good solution for all the reasons assumed. The issue here is treating a Span<char> as a string is simply wrong. It is not a "string", it is a buffer with arbitrary characters, dropping the null during the Slice is mistake (1) and then using the pwszPrevCDFTag.LastIndexOf('\0') is mistake (2) and does fix (1) because the point of Span<T> is to limit the size. The proposed fix here attempts to fix that but then breaks the .NET contract for Span<char> to string. Arguments could be made we got this wrong, but this the long standing contract.

I think the correct solution here is to fix up the generated API and use the same type in the signature for the same type in all places. Right now the return is winmdroot.Foundation.PWSTR, but the second argument is ref Span<char>, which doesn't match the definition of CryptCATCDFEnumMembersByCDFTagEx.

The API in question is a bit weird and playing games with the C/C++ defintion of strings, perhaps mistake (3) or even (4). I would fix the types to all match in the same way they match the C/C++ definition and make them play well together.

if (pwszPrevCDFTag != null && pwszPrevCDFTag.LastIndexOf('\0') == -1)				throw new ("Required null terminator missing.", "pwszPrevCDFTag");

I think the above pwszPrevCDFTag.LastIndexOf('\0') == -1 check is just wrong though. This is attempting to apply both buffer semantics and string semantics to Span<char>. My suggestion would apply buffer semantics to Span<T>, in all cases. This means that any slice construction should be aware of "string" semantics and avoid them.

/cc @jkoritzinsky

@AArnott
Copy link
Member Author

AArnott commented Oct 9, 2024

Thank you very much for your analysis, @AaronRobinsonMSFT. I don't understand it all yet.
Let's start with your last point: The null check that the friendly overload includes. Why is that wrong? That check is generated when the extern API requires a null-terminated string. We're making sure that it is in fact null-terminated. Why? Because a lot of C# folks expect that calling into an API that accepts Span<char> will stop processing at Span.Length, and not realize that a null character is important. It seems like a good correctness check.

But if your high-level point is we shouldn't accept Span<char> in our friendly overloads at all, but should accept PWSTR everywhere, I guess that APIs around PWSTR can help ensure null termination at that level.

@AaronRobinsonMSFT
Copy link
Member

But if your high-level point is we shouldn't accept Span in our friendly overloads at all, but should accept PWSTR everywhere, I guess that APIs around PWSTR can help ensure null termination at that level.

This is precisely my point and why the \0 check doesn't follow. A Span<char> is a buffer so it needs to be treated like a buffer. In this case it is only treated like a buffer on iteration (1). On iteration (2) it is a PWSTR that is a "string" but passed as a "buffer" (that is, Span<char>). If the expectation is to have a "null terminated buffer of characters", then a Span<char> makes sense and should always be the data type - both input and return. If it is expected to be a "string", then use the .NET string type that provides that invariant instead of trying to enforce it manually. Alternatively one can use the PWSTR abstraction, which in this case might be required - ugh, but in that case a decision needs to be made; is it a "string" or a "buffer"? The former has strong assumptions and should be "correct by construction" so that can be asserted but doesn't need to be validated in Release, whereas the latter is more free form and should be validated in Release.

@AArnott
Copy link
Member Author

AArnott commented Oct 10, 2024

So far, the friendly overloads try to avoid allocations or requiring allocations. In this case where they use ref Span<char>, we know that native code declares it as PWSTR and that it should be null terminated, and that the function may mutate the data. So the friendly overload couldn't use just string for the parameter, as that's immutable. We could use ref string and then take it upon ourselves to allocate a new string. But personally, I think I'd rather either just take a PWSTR parameter or keep it ref Span<char>.
Going with your "treat it as a buffer" idea, I suppose that means that I should hold the length of it constant instead of trying to resize it. I should take Span<char> rather than ref Span<char> and leave the length alone.
In some cases, the win32 function provides an out parameter to indicate the length of the returned data, which seems friendly to remove that parameter and automatically 'shrink' the buffer on their behalf. That's related but not the same as the use case for this issue.

@AaronRobinsonMSFT
Copy link
Member

But personally, I think I'd rather either just take a PWSTR parameter

That is a completely reasonable approach in this case. This approach then does mean that PWSTR needs to be aware that in order to align with string semantics, in the case someone wants to convert PWSTR to string, that the inner Span<char> contains a trailing null and that it should slice that away when coverting.

@JeremyKuhne
Copy link
Member

Personally, I'd be very careful exposing too much help around buffers that need null terminated with low-level .NET types. For more complicated scenarios I'd either avoid the risk or create wrapper ref struct types that encapsulate the buffer-like complexity.

One example of encapsulation WinForms uses for buffers is BufferScope. Allows doing work on the stack and falling back to ArrayPool to minimize new allocations.

For read-only in parameters on .NET (not .NET Framework) you can use MemoryMarshal.TryGetString to check if you have an actual string behind a span (to know that you actually have a null past the end of your span). On .NET Framework you could create multiple overloads to handle the string case.

One option I have tried that works down-level is to create an encapsulating type for in "null terminated string" parameters:

public readonly ref struct StringSpan
{
    private readonly ReadOnlySpan<char> _span;
    private readonly string? _string;
    // ...

@AArnott
Copy link
Member Author

AArnott commented Oct 11, 2024

Thanks.

create an encapsulating type for in "null terminated string" parameters

That is in fact what PWSTR is. That and its constant PCWSTR counterpart are defined to be null-terminated strings in the headers, so our C# projections already depend on that when converting themselves to .NET strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Friendly overload should preserve null terminator for ref Span<char> parameters
3 participants