Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Patterns #2899

Open
deeprobin opened this issue Apr 5, 2020 · 11 comments
Open

Enhanced Patterns #2899

deeprobin opened this issue Apr 5, 2020 · 11 comments

Comments

@deeprobin
Copy link

std::pattern::Pattern could be "hello".
Example:

let s = "hello world":
s.contains("hello"): // ==> true

But it is also faster to do multi checks instead of using multiple contains functions.

Bad practise

s.contains('a') || s.contains('b') // about 51ns because the string makes two checks.

I created myself a Pattern implementation which is about 19ns and does just one check. But I think there must be a own pattern syntax.
Something like:

s.contains('a' || 'b')
@Ixrec
Copy link
Contributor

Ixrec commented Apr 5, 2020

I suspect this falls under rust-lang/rust#56345 / RFC #2500 "Needle API" somehow, though I've never fully understood that RFC

@kennytm
Copy link
Member

kennytm commented Apr 6, 2020

You don't even need #2500 for this.

s.contains(&['a', 'b'][..])

@deeprobin
Copy link
Author

@kennytm Yes, that's right, but that's not the only thing.

s.contains(&["abcd", "aaaa"]);

does also not work because that Pattern is only implemented in char-slices. An &str will be compared char by char. But what if we analyze that pattern in compile-time and see that the strings start with the same chars in this case "a". Why do we need to check these chars twice? Do you understand?

@kennytm
Copy link
Member

kennytm commented Apr 7, 2020

@deeprobin The standard library is not capable of "analyzing that pattern to see that strings both start with a" in run-time, let alone in compile-time.

You'd better use aho-corasick if you need to efficiently search for "abcd" || "aaaa".

@deeprobin
Copy link
Author

@kennytm Exactly and that's why I created this issue so that this will be implemented at some point.

@kennytm
Copy link
Member

kennytm commented Apr 7, 2020

You'll need to explain

  1. why we need to essentially move aho-corasick into std to support searching multiple strings efficiently — is this feature so essential that crates.io is insufficient, and must be provided by the standard library? (and at this point why not just move regex into std)
  2. is that a || b syntax needed

@deeprobin
Copy link
Author

  1. I think one should optimize what can be optimized. That means you should at least support simple multi-patterns like aho-corasick. regex supports more complicated patterns where I can maybe understand something that is not in the std.

  2. The a || b syntax is of course not absolutely necessary but would make the code a bit clearer.

@pickfire
Copy link
Contributor

pickfire commented Apr 11, 2020

s.contains(&['a', 'b'][..])

Should we add that to standard library documentation? And also mention the use of aho-corasick or regex if they need additional stuff.

@deeprobin
Copy link
Author

s.contains(&['a', 'b'][..])

Should we add that to standard library documentation? And also mention the use of aho-corasick or regex if they need additional stuff.

As long as this is not yet implemented in std this would be very advantageous.

@shepmaster
Copy link
Member

Should we add that to standard library documentation

All implementors of Pattern are automatically documented

@pickfire
Copy link
Contributor

@shepmaster Yes, but there are no examples there. Also the description is inconsistent, some ends with a period but some doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants