Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try a String.Replace with 1-length oldValue and multi-length newValue #44061

Closed
lateapexearlyspeed opened this issue Oct 30, 2020 · 8 comments
Closed
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@lateapexearlyspeed
Copy link
Contributor

Background and Motivation

When use string.Replace() to replace 1-length 'oldValue' by >1 length string newValue (length is known before compiling), sometimes I thought there was a method overload as: string.Replace(char, string). Like existing method string.split() which has one overload with string-type separator but also has char-type separator overload. The benefit of trying to use char-type argument overload as possible is well-known and described by lots of blogs (char is a value type without overhead of ref type and most time it is in stack and size is also smaller than word size so that argument assignment can have better performance, compared to string)

Proposed API

+	public System.String Replace(char oldChar, System.String? newValue)

With this idea, I tried to understand source code of string.Split() and created this new overload by refering to implementation of string.Replace(string, string). The main and natural difference is can remove multi-char handling logic for 'oldValue' argument.

After that, I read some wikis of dotnet/runtime and consult some contributors so that I have my first-time journey to build my new corelib locally and update dotnet/performance to run benchmark on mew string.Replace(char, string). Benchmark result compared with original overload passing 1-length oldValue string:

BenchmarkDotNet=v0.12.1.1448-nightly, OS=Windows 10.0.17763.1518 (1809/October2018Update/Redstone5)
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.100-preview.7.20366.6
[Host] : .NET 6.0.0 (6.0.20.51309), X64 RyuJIT
Job-WSWQJP : .NET 6.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
WarmupCount=1

Method text oldValue newValue Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
TestOriginalReplace_TypeOfOldValueIsString aaaaaaaaaa a bb 76.994 ns 0.1520 ns 0.1187 ns 76.998 ns 76.752 ns 77.241 ns 0.0102 - - 64 B
TestNewReplaceOverload_TypeOfOldValueIsChar aaaaaaaaaa a bb 66.182 ns 2.0695 ns 2.1252 ns 65.289 ns 64.225 ns 72.201 ns 0.0101 - - 64 B
TestOriginalReplace_TypeOfOldValueIsString aaaaaaaaaa c bb 13.215 ns 0.6135 ns 0.7065 ns 13.029 ns 12.149 ns 14.610 ns - - - -
TestNewReplaceOverload_TypeOfOldValueIsChar aaaaaaaaaa c bb 9.605 ns 0.1976 ns 0.1848 ns 9.481 ns 9.430 ns 9.904 ns - - - -

So just want to ask, is it worth to introduce this new Replace() overload to corelib so that can have better performance in case of 1-length 'oldValue' known during compile time (If yes, can I try to contribute into remote repo). Please give comment, thanks!

@lateapexearlyspeed lateapexearlyspeed added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 30, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 30, 2020
@Joe4evr
Copy link
Contributor

Joe4evr commented Oct 30, 2020

CC: @bartonjs @joperezr I believe this falls in the System.Runtime area?

@danmoseley
Copy link
Member

Do you have data on how common this pattern is? Eg by using grep.app with some regex, possibly.

@RamType0
Copy link

IMHO, newValue should be ReadOnlySpan<char>.

@RamType0
Copy link

Even if this API will be not approved, internally use this path to handle the case which we receive string with length of 1 is seems to be good.

@stephentoub
Copy link
Member

Thanks for the issue. #44088 optimizes the path where oldValue is a single character. There's not much overhead in extracting the single char from the string, so the primary benefit of the additional proposed overload would I think be just if the oldValue was dynamically computed and required a string allocation that could otherwise be avoided, but from looking at existing usage, that's not common.

@lateapexearlyspeed
Copy link
Contributor Author

Hi @stephentoub Awesome you optimizes single char old value that soon, thanks for optimizing work :)

Actually what I would like to do is to find some idea during my daily .net development and try to verify benefit of code change for idea in my local. If I can convince myself, I really want to contribute my change to dotnet repo based on your guys comments.
Then I raised issue here before raise PR based on Contribution guidance.
I know this new API overload proposal is not approved, but if you guys can give some comment here directly, I still like to change existing method for that single char old value. It is still very exciting to contribute my part of investigation code into real runtime. It is a little pity to find optimization has been merged after next day of raising issue. :) Again I know it is not to create new overload.

So the question is, if we want to contribute code, should we raise issue before PR, or raise PR directly and wait for comment? What is the recommended way to have opportunity to contribute code? Can you guys give related comments just in current issue so that issuer can adjust solution? please give actual guidance, thanks again!

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021
@tannergooding tannergooding added this to the Future milestone Jul 12, 2021
@stephentoub
Copy link
Member

So the question is, if we want to contribute code, should we raise issue before PR, or raise PR directly and wait for comment?

PRs are welcome (though of course aren't guaranteed to be merged) without an issue if the changes being proposed are relatively small and well-justified, e.g. you're able to improve the performance of some existing API and speak to both the improvements and possible regressions with numbers.

Given #44061 (comment), I'm going to close this issue now. If you still see an impactful opportunity here and can highlight places where a new API would make a meaningful difference, feel free to re-open. Thanks.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants