-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup some string operating functions #96099
Conversation
src/coreclr/inc/corhlprpriv.h
Outdated
|
||
HRESULT hr = FString::Utf8_Unicode_Length(utf8str, & allAscii, & length); | ||
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be LPSTR (UTF8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unchanged (with whitespace off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length
-> destLen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR)); | |
CHAR16_T* buffer = (CHAR16_T*) AllocThrows((length + 1) * sizeof(CHAR16_T)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dn-u16.h uses WCHAR
while minipal/utf-8 uses CHAR16_T
. They are not treated as compatible although both typedef-d from wchar_t. Is any reason that compiler supported char16_t can't be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some platforms do not support char16_t
, which is why we went with the least supported type to handle coreclr and mono simultaneously.
runtime/src/native/minipal/utf8.h
Lines 21 to 25 in 17594ec
#ifdef TARGET_WINDOWS | |
typedef wchar_t CHAR16_T; | |
#else | |
typedef unsigned short CHAR16_T; | |
#endif |
During this refactoring, it's better to directly allocate the target type; CHAR16_T
for UTF16 or char
for UTF8 to avoid further confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some platforms do not support
char16_t
,
I was asking for the exact platforms and GCC/Clang versions we need to target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS doesn't support char16_t
so you can cross the whole IsApplePlatform series off the list. Our least supported versions support C11 and hence char16_t
on Linux, so only OS headers are relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS doesn't support
char16_t
Asked a friend to confirm that char16_t
is supported with latest build tool on macOS. The mentions of macOS doesn't support char16_t
are quite old. On cppreference, char16_t
is marked as supported by Apple clang. There's also documentation about CChar16 in Swift doc.
Can you confirm again about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in C mode, we had some problems. I don't remember the details offhand. This is the quick way to find out what doesn't work in the CI:
#ifdef TARGET_WINDOWS
typedef wchar_t CHAR16_T;
#else
- typedef unsigned short CHAR16_T;
+ typedef char16_t CHAR16_T;
#endif
(note that we'd still need CHAR16_T
for Windows, so perhaps it's better to keep this investigation separate from this PR)
Thanks for starting this work. Ideally we should remove the intermediate wrappers for string conversion and directly use minipal/utf8.h in coreclr/mono/corehost, including windows targets; but it will take a while to get there. (I started the grand change in a branch, but realized that it's best to do in batches to ease up the review like you are doing here) |
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Now tests are passing. I need some assist about link failure on macos. |
|
Hmm it is linked in Unix pal, but not minipal. Will it cause duplicated symbols on Linux? |
Co-authored-by: Aaron Robinson <arobins@microsoft.com>
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Since the buffer will always be sufficient, the resultant error code could only be NO_UNICODE_TRANSLATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@am11 anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
This reverts commit bc81c55.
* Revert "Forward minipal_get_length_utf16_to_utf8 and minipal_convert_utf16_to_utf8 from DAC (dotnet#97055)" This reverts commit 1263107. * Revert "Cleanup some string operating functions (dotnet#96099)" This reverts commit bc81c55.
Replace with standard or shared ones.
Please confirm their behavior and performance characteristics are desired.