-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: add ReplacingReader
#153
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be infallible variants of the try_to_replacing_reader
and try_to_replacing_reader_with
methods?
e7f300c
to
c350c9f
Compare
Thanks for this! It's okay to just submit PRs, but I usually encourage issues to avoid wasted work. However, in this case, I do actually think this is a pretty useful feature. It's off-loading a lot of complexity that callers would have to do to achieve this same task, so I like the trade-off here. With that said, the type signatures here are quite gnarly. And the fact that an The I'd also suggest looking at the |
I already wrote the reader state machine when I was experimenting with naive search & replace approaches, so it wouldn't have been the end of the world if this PR ended up as an unnecessary effort. But I'm definitely glad you're on board with it 😄 !
A bit easier said than done. I went with the We could return If you've go some other idea, I'm all ears!
Fair point, agreed.
Oops! Will definitely look into it! |
29e7bf8
to
d622967
Compare
I hereby declare myself bamboozled. I went back on the "casting between trait objects isn't really a thing" because I knew I had some luck with that before just that I did not remember exactly how as it's been a while and quick searches didn't turn out fruitious. Nevertheless, I slept on it and remembered that it had to do with declaring a conversion method in the trait itself. So I tried extending impl<A> AcAutomaton for A
where
A: Automaton + Debug + Send + Sync + UnwindSafe + RefUnwindSafe + 'static,
{
fn as_dyn_automaton(self: Arc<Self>) -> Arc<dyn Automaton> {
self
}
} However, an error saying "method cannot be invoked on a trait object" is returned when doing something like: self.aut
.as_dyn_automaton()
.try_to_replacing_reader_with(rdr, replace_with) But I don't get why it works with |
Just a side note that I tried other variations of this like returning a |
I haven't looked too closely here, but I was thinking it could be achievable with either more code or another layer of indirection (e.g., an internal trait). But I'd have to look more closely, and I'm not sure when I'll have the time for that. |
No worries, I'm still looking into it. I'll do some more digging and get back. |
Okay so I jumped outside the box and realized that instead of doing trait gymnastics we can do something else to avoid the While experimenting with these approaches I noticed that this is already being done for I'll look into the docs matter next. |
This example shows how to execute a search and replace on a stream while piping | ||
data to a writer without loading the entire stream into memory first. This is | ||
advantageous over combining [`AhoCorasick::try_to_replacing_reader_with`] and | ||
something like [`std::io::copy`] because it avoids double buffering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended the comment here just to make it clearer that AhoCorasick::try_stream_replace_all
is to be preferred when all one wants to do is pipe data between a reader and a writer.
FWIW, I know now why this was the behavior. There's an While casting But if that worked, then an |
Hello 👋 ,
I hope it's fine that I directly created a PR without opening an issue. This PR adds a
ReplacingReader
and methods to construct one, which allows replacing matches from an underlying reader while streaming.The mechanism is similar to the
AhoCorasick::try_stream_replace_all
andAhoCorasick::try_stream_replace_all_with
methods, but instead of piping data to a given writer in one go, a reader is returned which can then stream data with bits of it replaced. This can be very handy in some situations, like when you'd want to useserde
to deserialize data directly from a reader.That's in fact the use case that drove me to create this PR. I initially wanted to create a standalone library for this and was looking into string searching algorithms and
aho-corasick
seemed particularly suitable. Lo and behold, the library was already almost doing what I wanted! So I figured that it might be better to create a PR here which allows reusing internal bits.