-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse integers in a string based on an index #16933
Comments
I would wait to see what happens with slices (dotnet/roslyn#120, experimental implementation in corefxlabs). With slices, the new overload could be just: public int Parse(ReadOnlySlice<char> s); |
Oh nice one I haven't heard of those! |
I think special syntax (possibly something like |
Ah okay thanks for the heads up! I'll update the PR 59 propose the ReadOnlySlice addition |
+1 for slices. FWIW, we built a complete The nice thing is that many operations can be lifted over these without incurring allocations, e.g. More advanced operations such as "rebasing" can be built too, where an array of string segments is passed to a function that analyzes the spans, trims the underlying strings into substrings that have a high coverage by segments on top (say an 80% threshold), and substitutes the segments to refer to these newly allocated smaller strings. This is very useful in text extraction procedures where a bunch of intermediate operations perform parsing, substring operations, trim operations, etc. and ultimately want to keep the results alive but a) without keeping the whole original document alive, and b) without incurring a lot of intermediate allocations. Where it falls down is indeed in the lack of overloads to |
@svick |
@bartdesmet |
@KrzysztofCwalina can comment on plans for slices in CoreFX proper. |
It would be pretty cool if we could call APIs taking |
Related: #14802 |
@terrajobst @KrzysztofCwalina do we want to contemplate "regular" API for this, or defer until slices? |
I would do regular (string, int, int) APIs in corfx. We will do Span APIs as an OOB package. |
OK. @alexperovich @joperezr please either give feedback or mark as api-ready-for-review. @hughbe looks like you have this just for ints, do you believe that's sufficient to get almost all the benefit, or are you proposing to add it to all the other primitive types? |
API looks good to me, so I'll mark as ready for review and we'll check it on triage. |
This is an area that we're actively working on in CoreFxLab (see roadmap). For details on how we think about parsing, take a look at this speclet. Given that everything around spans is experimental, it's too early to do regular API reviews against CoreFx, that's why I'm closing it here. However, @hughbe, I'd love to continue the discussion over in CoreFxLab! 😄 |
Summary
Currently parsing integers takes a string, and parses the entirety of the string. I propose adding APIs to parse a string from an index, with an additional overload specifying the length.
Rationale
In order to parse part of a string, we need to perform a
string.Substring(...)
method on the string. This results in allocations, affecting performance.Proposed APIs - with slices
Proposed APIs - without slices
Discussion
Parse(string, int)
overload? I think most people calling his API would know the required length.Example
I added some custom parsing code in a PR in coreclr (dotnet/coreclr#3992). This is basically the code that these proposed methods would use.
Parsing performance of ints in Version increased by up to 4x due to reduced substring allocations.
Another example is Guid (dotnet/coreclr#3965) where performance of parsing Guid strings increased by up to 3x with the allocation count reducing to 0.
Alternatives
Parse(char* s, int count)
method instead but this is rareThe text was updated successfully, but these errors were encountered: