-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[browser][non-icu] HybridGlobalization
indexing
#85254
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization Issue Detailsnull
|
Tagging subscribers to 'arch-wasm': @lewing Issue Details1kB: 4kB: 16 kB: 64kB:
|
HybridGlobalization
indexing
I guess one more merge from main would fix WBT |
|
||
// The match may be affected by special character. Verify that the following character is regular ASCII. | ||
if (sourceIndex < source.Length - 1 && *(a + sourceIndex + 1) >= 0x80) | ||
goto InteropCall; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we wasting CPU cycles on average by doing the work again in JS ? Could we pass the current position and continue the search ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we wasting CPU cycles on average by doing the work again in JS ?
Yes, when the locale is en-*
and the compariosn option is not IgnoreSymbols
.
Could we pass the current position and continue the search ?
That makes sense. Because everyting up to the current position is equal, it pays off to start comparison in JS from that index, not from the beginning. But continuation of search is not beneficial - we will just compare all the remaining string in the InteropCall because we established that it's not a regular all-ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do it due to surogates. Example:
source: "o\u0308" = ö
needle: "o"
expected result: -1
If we do: check char by char till you get to some non-ASCII, save from what idx you had the match (from idx=0) and then send the rest of the strings (source: "\u0308", needle: "") to continue in JS - we get result = 0.
We need to keep the helper, it's a good optimisation in most cases. Only cases with "lotsOfAsciiAndThenAUnicodeAtTheEndThatWasNotCaughtByTheRegex" will suffer here, so not many. We could get rid of it totally and just send all strings to JS but then the "allAscii" cases would suffer.
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Runtime/tests/TrimmingTests/System.Runtime.TrimmingTests.proj
Show resolved
Hide resolved
Failure is not connected. |
Implements a chunk of web-api based globalization. Is a part of HybridGlobalization feature and contributes to #79989.
Old, icu-based private API: GlobalizationNative_IndexOf, GlobalizationNative_LastIndexOf
New, non-icu private API: Interop.JsGlobalization.IndexOf
Affected public API (see: tests in CompareInfoTests.IndexOf.cs, CompareInfoTests.LastInfexOf.cs):
All changes in behavior are listed in docs\design\features\hybrid-globalization.md.