-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Consolidate and optimize TextInfo.ChangeCase (string/char.ToUpper/Lower) #17391
Conversation
|
This affects char.ToLower/Upper aka TextInfo.ToLower(char) but Perf_Char is almost empty. Do we have enough coverage of the char overloads and if not would you consider adding enough to make sure you didn't regress them? |
Added at dotnet/corefx#28765. The results are the same before/after this PR. |
|
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please @dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test please |
| for (sourcePos = 0; sourcePos < source.Length; sourcePos++) | ||
| { | ||
| // If the character is lower-case, we're going to need to allocate a string. | ||
| char c = source[sourcePos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char c = source[sourcePos]; [](start = 24, length = 27)
Is it better to use a pointer to the string instead of using the string directly here? I think this can avoid the bound checking when accessing the indexed characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT should be able to avoid the bounds checking here, as it can see that sourcePos is always within the bounds of the string based on the loop construction. I'd tried using pointers, but it didn't help with throughput and it made the code more unsafe, so I reverted it.
|
Thanks, @stephentoub for optimizing this. I think the gain will be very valuable. |
| } | ||
|
|
||
| int ret; | ||
| Debug.Assert(pSourceLen == pResultLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add debug asserts that check pSource, pResult is not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug.Assert(pSourceLen == pResultLen); => Debug.Assert(pSourceLen <= pResultLen); ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add debug asserts that check pSource, pResult is not null?
Sure
Debug.Assert(pSourceLen == pResultLen); => Debug.Assert(pSourceLen <= pResultLen); ?
Why? Is there a situation where it'll be < rather than ==?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called for both strings and spans, correct?
The spans passed in to this method can come from the user who passed in a larger destination:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/MemoryExtensions.Fast.cs#L207
At the call site, the source.Length - length could be <= destination.Length - length
ChangeCase(a, source.Length - length, b, destination.Length - length, toUpper);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. Will fix.
| return result; | ||
| } | ||
|
|
||
| internal unsafe void ChangeCase(ReadOnlySpan<char> source, Span<char> destination, bool toUpper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method internal instead of private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably because it's used from Span? Let's ask the person who added it:
1494667#diff-4724ac68e6d26f4724534fcbc2cf5374R68
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably because it's used from Span? Let's ask the person who added it:
Right, of course. I talked to him offline, he confirms that was the reason :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
- Move most of the implementation to the platform-agnostic file, rather than having completely different implementations for Windows and Unix. Now the only logic in each platform-specific file is the logic around invoking the associated P/Invoke. - Optimize that implementation to take a fast path that doesn't allocate when no case change is needed, and to avoid the native call when the whole string is ASCII.
67a40e6 to
8706916
Compare
- Move most of the implementation to the platform-agnostic file, rather than having completely different implementations for Windows and Unix. Now the only logic in each platform-specific file is the logic around invoking the associated P/Invoke. - Optimize that implementation to take a fast path that doesn't allocate when no case change is needed, and to avoid the native call when the whole string is ASCII. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
- Move most of the implementation to the platform-agnostic file, rather than having completely different implementations for Windows and Unix. Now the only logic in each platform-specific file is the logic around invoking the associated P/Invoke. - Optimize that implementation to take a fast path that doesn't allocate when no case change is needed, and to avoid the native call when the whole string is ASCII. Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
- Move most of the implementation to the platform-agnostic file, rather than having completely different implementations for Windows and Unix. Now the only logic in each platform-specific file is the logic around invoking the associated P/Invoke. - Optimize that implementation to take a fast path that doesn't allocate when no case change is needed, and to avoid the native call when the whole string is ASCII. Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
- Move most of the implementation to the platform-agnostic file, rather than having completely different implementations for Windows and Unix. Now the only logic in each platform-specific file is the logic around invoking the associated P/Invoke. - Optimize that implementation to take a fast path that doesn't allocate when no case change is needed, and to avoid the native call when the whole string is ASCII. Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
cc: @jkotas, @danmosemsft, @tarekgh, @ahsonkhan
Using the perf tests added in dotnet/corefx#28748: