Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve differences between string.IndexOfAny and MemoryExtensions.IndexOfAny #60864

Closed
Tracked by #64603
stephentoub opened this issue Oct 26, 2021 · 1 comment · Fixed by #63817
Closed
Tracked by #64603

Resolve differences between string.IndexOfAny and MemoryExtensions.IndexOfAny #60864

stephentoub opened this issue Oct 26, 2021 · 1 comment · Fixed by #63817
Assignees
Labels
Milestone

Comments

@stephentoub
Copy link
Member

For a number of values <= 5, string.IndexOfAny just delegates to MemoryExtensions.IndexOfAny. But for values length > 5, it implements a "probabilistic map" (basically a Bloom filter) that it uses when walking each element of the input to determine whether it's likely in the set, and then does the actual comparison against the set only for the likely ones. In contrast, for values length > 5, MemoryExtensions.IndexOfAny walks each character in the input, and for each walks each of the values to see if they match.

There's little reason these should be different. We should decide on and pick the better strategy (or an even better one if such a thing exists), use it in MemoryExtensions.IndexOfAny, and then have the string.IndexOfAny just unconditionally delegate to MemoryExtensions.

cc: @GrabYourPitchforks

@stephentoub stephentoub added this to the 7.0.0 milestone Oct 26, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Oct 26, 2021
@ghost
Copy link

ghost commented Oct 26, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

For a number of values <= 5, string.IndexOfAny just delegates to MemoryExtensions.IndexOfAny. But for values length > 5, it implements a "probabilistic map" (basically a Bloom filter) that it uses when walking each element of the input to determine whether it's likely in the set, and then does the actual comparison against the set only for the likely ones. In contrast, for values length > 5, MemoryExtensions.IndexOfAny walks each character in the input, and for each walks each of the values to see if they match.

There's little reason these should be different. We should decide on and pick the better strategy (or an even better one if such a thing exists), use it in MemoryExtensions.IndexOfAny, and then have the string.IndexOfAny just unconditionally delegate to MemoryExtensions.

cc: @GrabYourPitchforks

Author: stephentoub
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: 7.0.0

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 28, 2021
@stephentoub stephentoub self-assigned this Jan 14, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 14, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 16, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants