-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: support searching an Iterator<u8> #386
Comments
Nothing has changed since that reddit discussion. It's not really a matter of will, but a matter of resources. A feature like this needs significant implementation work and API design work. I do have vague plans to work on it, but it is not something you should expect to happen any time soon. (The right scale of measurement here is probably many months or years.) Someone else could champion this effort, but said person would need to get their hands very dirty. In the past, I have implemented a somewhat naive byte based finite state machine using similar concepts in the fst-regex crate. I doubt you could use it as is, but it might serve as an example of how to hack something together. You might consider investigating what the Xi editor does. They are using fancy-regex IIRC, which is built on top of the regex crate. Finally, you could find other means of achieving your goals. For example, by sacrificing multiline matching and searching a line at a time. (I am aware of the downsides of such a path and you have probably also considered it, but i figured it was still worth mentioning.) |
Thanks, those are very useful references. My understanding of fancy-regex is that it gives you all the exotic features of PCRE, whereas rust-lang/regex is a RE2 style regex implementation with strong performance guarantees. Looking at the elisp regex syntax I don't believe there are any features that aren't present in this regex library. There's some extra syntax-based features (but we could look up the syntax characters and pass them through), and 'back references' (but here that just refers to replacing a matched group). That's all I'm aware of. Elisp regexps have the useful property that Regarding line-at-a-time matching, given a gap buffer: struct GapBuffer {
text: Vec<u8>,
gap_start: ptrdiff_t,
gap_end: ptrdiff_t,
} it would be possible to search three strings (Rust-ish pseudocode):
In the event that a user gave us a multiline regex, we'd have to copy the data so the buffer is contiguous. Have I understood you correctly regarding line-at-a-time? |
I believe so. As one more additional reference that might help, ripgrep forces line-at-a-time matching by forcefully removing anything in a regex's AST that could possibly match a new line: https://github.com/BurntSushi/ripgrep/blob/master/grep/src/nonl.rs --- You probably don't want to do that specific task, but that routine might serve as a good starting point for detecting whether a regex could match across multiple lines or not.
Right. I didn't mean to focus on the particular features of the regex engine so much as say that Xi is using this regex engine in a text editor, so there might be lessons in Xi that you could learn from. @raphlinus is the guy to talk to about that! |
Thanks again for your help and all those references. I've written up an RFC for using Rust regex crates in Remacs if you're interested. How would you like to proceed with this GitHub issue? Should we leave it open, or would you prefer it closed as |
@Wilfred Ah awesome, I'll be sure to check out that RFC! I think we can leave this issue open. I wish I could give you a better estimate of when I think I'll get to this, but I have a lot on my plate! I have many places I want to take the regex crate. :-) |
Following up on this reddit discussion, it would be very useful for the Remacs project if this crate supported searching non-contiguous blocks of memory. Currently, it's not possible to search a gap buffer with this crate. The reddit discussion suggests it would be useful for spidermonkey too.
Would you be willing to add this feature?
The text was updated successfully, but these errors were encountered: