-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement String::remove_matches #71780
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/liballoc/string.rs
Outdated
// the Searcher docs | ||
unsafe { | ||
for (start, end) in matches { | ||
ptr::copy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use copy_nonoverlapping
here right? That was my intuition, but I just wanted to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, I thought it does not overlap? The memory for deletion should not overlap to begin with. So I think it is good to use copy_nonoverlapping
.
By the way, I think it may be better to keep a reference to current cursor rather than keeping shrunk_by
which we can directly work on. It could probably reduce one add
operation here.
285c753
to
b9bf5f4
Compare
Your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem. Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Let's try r? @kennytm who I believe is fairly familiar with the searcher API and such |
src/liballoc/string.rs
Outdated
|
||
let matches = { | ||
let mut searcher = pat.into_searcher(self); | ||
let mut matches = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I don't think using a Vec here counts as "does not use any allocation" (the original promise in #50206).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I couldn't think of a way to do this without allocating a Vec to hold the match start/ends, but I think I have an idea now that may work. I'll try it when I get off work today in about 6 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and tested my idea and unfortunately it doesn't work. The way I can see this you either have to:
a. Store the match start/end pairs somewhere, then start modifying the string, as I've done here
or
b. Modify the string during searching.
The problem with b is that the searcher will end up missing matches as parts it would've matched on are potentially moved behind where it's looking.
Here's the code I tried, for reference:
use core::str::pattern::Searcher;
let vec_ptr = self.vec.as_mut_ptr();
let len = self.len();
let mut shrunk_by = 0;
let mut searcher = pat.into_searcher(self);
// SAFETY: start and end will be on utf8 byte boundaries per
// the Searcher docs
unsafe {
while let Some((start, end)) = searcher.next_match() {
ptr::copy(
vec_ptr.add(end - shrunk_by),
vec_ptr.add(start - shrunk_by),
len - end,
);
shrunk_by += end - start;
}
drop(searcher);
self.vec.set_len(len - shrunk_by);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see value in this new method, even if it does allocate. i'm not on the libs team so don't weight my opinion too heavily, but my vote would be to merge a working implementation, and then open up a follow up issue for removing the allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this method is better, I don't see why do we need another Vec
for this. The only good reason I see doing a two-pass method like here is used in SIMD JSON which the first pass identify the points which makes deserialization faster.
b9bf5f4
to
9306787
Compare
By the way since we've gotten this far should I open a tracking issue and update the |
9306787
to
b19afdc
Compare
src/liballoc/string.rs
Outdated
#[unstable(feature = "string_remove_matches", reason = "new API", issue = "none")] | ||
pub fn remove_matches<'a, P>(&'a mut self, pat: P) | ||
where | ||
P: for<'x> Pattern<'x>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bound is pretty unusual. Does something block this from being implementable with only a P: Pattern<'a>
bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't compile without the higher-rank trait bound. I talked to someone on the community Discord about this and they suggested the HRTB and it made it compile.
I think it had something to do with not having the HRTB meant Rust thought pat
was going to live as long as self
was, when in fact the lifetime of pat
had to be shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change P: for<'x> Pattern<'x>
to P: Patten<'a>
compiling fails with:
error[E0502]: cannot borrow `self.vec` as mutable because it is also borrowed as immutable
--> src/liballoc/string.rs:1274:21
|
1249 | pub fn remove_matches<'a, P>(&'a mut self, pat: P)
| -- lifetime `'a` defined here
...
1256 | let mut searcher = pat.into_searcher(self);
| -----------------------
| | |
| | immutable borrow occurs here
| argument requires that `*self` is borrowed for `'a`
...
1274 | self.vec.as_mut_ptr().add(end - shrunk_by),
| ^^^^^^^^ mutable borrow occurs here
error[E0502]: cannot borrow `self.vec` as mutable because it is also borrowed as immutable
--> src/liballoc/string.rs:1275:21
|
1249 | pub fn remove_matches<'a, P>(&'a mut self, pat: P)
| -- lifetime `'a` defined here
...
1256 | let mut searcher = pat.into_searcher(self);
| -----------------------
| | |
| | immutable borrow occurs here
| argument requires that `*self` is borrowed for `'a`
...
1275 | self.vec.as_mut_ptr().add(start - shrunk_by),
| ^^^^^^^^ mutable borrow occurs here
error[E0502]: cannot borrow `self.vec` as mutable because it is also borrowed as immutable
--> src/liballoc/string.rs:1280:13
|
1249 | pub fn remove_matches<'a, P>(&'a mut self, pat: P)
| -- lifetime `'a` defined here
...
1256 | let mut searcher = pat.into_searcher(self);
| -----------------------
| | |
| | immutable borrow occurs here
| argument requires that `*self` is borrowed for `'a`
...
1280 | self.vec.set_len(len - shrunk_by);
| ^^^^^^^^ mutable borrow occurs here
error: aborting due to 3 previous errors
For more information about this error, try `rustc --explain E0502`.
error: could not compile `alloc`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtolnay does this answer your question about why I used this trait bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not -- it just says that the current implementation relies on the too strict bound. In #71780 (review) I wrote that we'd want to fix the implementation so that a less strict bound is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example the following code also requires an unnecessarily strict bound, and does not compile if you remove 'a: 'static
, but it's easy to fix the implementation to not require that bound.
pub fn first<'a, T>(input: &'a [T]) -> Option<&'a T>
where
'a: 'static,
{
let slice: &'static [T] = input;
slice.get(0)
}
I am on board with landing this unstable (with an appropriate tracking issue). But I haven't reviewed the implementation at all other than the signature. |
b19afdc
to
7e8c710
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding #71780 (review): @jcotton42 it would be good to understand the error better. Can you put together a reproduction in the playground by stubbing out a Pattern trait and copying the method body from this PR?
Pattern<'a> is supposed to mean that it can search inside of data with lifetime 'a. Here we have incoming data with lifetime 'a that gets searched inside of, so any stricter trait bound should be fixable by fixing the implementation.
7e8c710
to
8636f8c
Compare
77ff362
to
58640a9
Compare
Sorry about the delay on testing the lifetime stuff, I've been having difficulties getting Rust to build on Windows, so I've switched to WSL in the meantime and am currently waiting for it to build so I can reproduce the lifetime error |
Looks good to me. @bors r+ |
📌 Commit a2571cf has been approved by |
…joshtriplett Implement String::remove_matches Closes rust-lang#50206. I lifted the function help from `@frewsxcv's` original PR (rust-lang#50015), hope they don't mind. I'm also wondering whether it would be useful for `remove_matches` to collect up the removed substrings into a `Vec` and return them, right now they're just overwritten by the copy and lost.
…joshtriplett Implement String::remove_matches Closes rust-lang#50206. I lifted the function help from ``@frewsxcv's`` original PR (rust-lang#50015), hope they don't mind. I'm also wondering whether it would be useful for `remove_matches` to collect up the removed substrings into a `Vec` and return them, right now they're just overwritten by the copy and lost.
⌛ Testing commit a2571cf with merge b1000219aa1d66a36070a3a334f793692a4c0316... |
…joshtriplett Implement String::remove_matches Closes rust-lang#50206. I lifted the function help from `@frewsxcv's` original PR (rust-lang#50015), hope they don't mind. I'm also wondering whether it would be useful for `remove_matches` to collect up the removed substrings into a `Vec` and return them, right now they're just overwritten by the copy and lost.
@bors retry yield (included in rollup) |
☀️ Test successful - checks-actions |
I know this comment couldnt come at a worse time, but should this method be
on &mut str instead of String?
…On Thu, Mar 18, 2021, 11:29 PM bors ***@***.***> wrote:
Merged #71780 <#71780> into master.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#71780 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADFWP4W3U6LQI3JQ5SZ2KTTELAIJANCNFSM4MXNXYUA>
.
|
It's still unstable, so it's not too late to make changes. However, this method needs to be on String, because it mutates the string. An analogous method on str would need to make a copy. |
…string, r=Mark-Simulacrum Improve test case for experimental API remove_matches ## Add Test Cases for `remove_matches` Function ### Motivation After reading the discussion in [this GitHub thread](rust-lang#71780), I'm trying to redesign the current API to use less memory when working with `String` and to make it simpler. I've discovered that some test cases are very helpful in ensuring that the new API behaves as intended. I'm still in the process of redesigning the current API, and these test cases have proven to be very useful. ### Testing The current test has been tested with the command `./x test --stage 0 library/alloc`. ### Overview This pull request adds several new test cases for the `remove_matches` function to make sure it works correctly in different situations. The `remove_matches` function is used to get rid of all instances of a specific pattern from a given text. These test cases thoroughly check how the function behaves in various scenarios. ### Test Cases 1. **Single Pattern Occurrence** (`test_single_pattern_occurrence`): - Description: Tests the removal of a single pattern occurrence from the text. - Input: Text: "abc", Pattern: 'b' - Expected Output: "ac" 2. **Repeat Test Single Pattern Occurrence** (`repeat_test_single_pattern_occurrence`): - Description: Repeats the previous test case to ensure consecutive removal of the same pattern. - Input: Text: "ac", Pattern: 'b' - Expected Output: "ac" 3. **Single Character Pattern** (`test_single_character_pattern`): - Description: Tests the removal of a single character pattern. - Input: Text: "abcb", Pattern: 'b' - Expected Output: "ac" 4. **Pattern with Special Characters** (`test_pattern_with_special_characters`): - Description: Tests the removal of a pattern containing special characters. - Input: Text: "ศไทย中华Việt Nam; foobarศ", Pattern: 'ศ' - Expected Output: "ไทย中华Việt Nam; foobar" 5. **Pattern Empty Text and Pattern** (`test_pattern_empty_text_and_pattern`): - Description: Tests the removal of an empty pattern from an empty text. - Input: Text: "", Pattern: "" - Expected Output: "" 6. **Pattern Empty Text** (`test_pattern_empty_text`): - Description: Tests the removal of a pattern from an empty text. - Input: Text: "", Pattern: "something" - Expected Output: "" 7. **Empty Pattern** (`test_empty_pattern`): - Description: Tests the behavior of removing an empty pattern from the text. - Input: Text: "Testing with empty pattern.", Pattern: "" - Expected Output: "Testing with empty pattern." 8. **Multiple Consecutive Patterns 1** (`test_multiple_consecutive_patterns_1`): - Description: Tests the removal of multiple consecutive occurrences of a pattern. - Input: Text: "aaaaa", Pattern: 'a' - Expected Output: "" 9. **Multiple Consecutive Patterns 2** (`test_multiple_consecutive_patterns_2`): - Description: Tests the removal of a longer pattern that occurs consecutively. - Input: Text: "Hello **world****today!**", Pattern: "**" - Expected Output: "Hello worldtoday!" 10. **Case Insensitive Pattern** (`test_case_insensitive_pattern`): - Description: Tests the removal of a case-insensitive pattern from the text. - Input: Text: "CASE ** SeNsItIvE ** PaTtErN.", Pattern: "sEnSiTiVe" - Expected Output: "CASE ** SeNsItIvE ** PaTtErN."
…=Mark-Simulacrum Improve test case for experimental API remove_matches ## Add Test Cases for `remove_matches` Function ### Motivation After reading the discussion in [this GitHub thread](rust-lang/rust#71780), I'm trying to redesign the current API to use less memory when working with `String` and to make it simpler. I've discovered that some test cases are very helpful in ensuring that the new API behaves as intended. I'm still in the process of redesigning the current API, and these test cases have proven to be very useful. ### Testing The current test has been tested with the command `./x test --stage 0 library/alloc`. ### Overview This pull request adds several new test cases for the `remove_matches` function to make sure it works correctly in different situations. The `remove_matches` function is used to get rid of all instances of a specific pattern from a given text. These test cases thoroughly check how the function behaves in various scenarios. ### Test Cases 1. **Single Pattern Occurrence** (`test_single_pattern_occurrence`): - Description: Tests the removal of a single pattern occurrence from the text. - Input: Text: "abc", Pattern: 'b' - Expected Output: "ac" 2. **Repeat Test Single Pattern Occurrence** (`repeat_test_single_pattern_occurrence`): - Description: Repeats the previous test case to ensure consecutive removal of the same pattern. - Input: Text: "ac", Pattern: 'b' - Expected Output: "ac" 3. **Single Character Pattern** (`test_single_character_pattern`): - Description: Tests the removal of a single character pattern. - Input: Text: "abcb", Pattern: 'b' - Expected Output: "ac" 4. **Pattern with Special Characters** (`test_pattern_with_special_characters`): - Description: Tests the removal of a pattern containing special characters. - Input: Text: "ศไทย中华Việt Nam; foobarศ", Pattern: 'ศ' - Expected Output: "ไทย中华Việt Nam; foobar" 5. **Pattern Empty Text and Pattern** (`test_pattern_empty_text_and_pattern`): - Description: Tests the removal of an empty pattern from an empty text. - Input: Text: "", Pattern: "" - Expected Output: "" 6. **Pattern Empty Text** (`test_pattern_empty_text`): - Description: Tests the removal of a pattern from an empty text. - Input: Text: "", Pattern: "something" - Expected Output: "" 7. **Empty Pattern** (`test_empty_pattern`): - Description: Tests the behavior of removing an empty pattern from the text. - Input: Text: "Testing with empty pattern.", Pattern: "" - Expected Output: "Testing with empty pattern." 8. **Multiple Consecutive Patterns 1** (`test_multiple_consecutive_patterns_1`): - Description: Tests the removal of multiple consecutive occurrences of a pattern. - Input: Text: "aaaaa", Pattern: 'a' - Expected Output: "" 9. **Multiple Consecutive Patterns 2** (`test_multiple_consecutive_patterns_2`): - Description: Tests the removal of a longer pattern that occurs consecutively. - Input: Text: "Hello **world****today!**", Pattern: "**" - Expected Output: "Hello worldtoday!" 10. **Case Insensitive Pattern** (`test_case_insensitive_pattern`): - Description: Tests the removal of a case-insensitive pattern from the text. - Input: Text: "CASE ** SeNsItIvE ** PaTtErN.", Pattern: "sEnSiTiVe" - Expected Output: "CASE ** SeNsItIvE ** PaTtErN."
…=Mark-Simulacrum Improve test case for experimental API remove_matches ## Add Test Cases for `remove_matches` Function ### Motivation After reading the discussion in [this GitHub thread](rust-lang/rust#71780), I'm trying to redesign the current API to use less memory when working with `String` and to make it simpler. I've discovered that some test cases are very helpful in ensuring that the new API behaves as intended. I'm still in the process of redesigning the current API, and these test cases have proven to be very useful. ### Testing The current test has been tested with the command `./x test --stage 0 library/alloc`. ### Overview This pull request adds several new test cases for the `remove_matches` function to make sure it works correctly in different situations. The `remove_matches` function is used to get rid of all instances of a specific pattern from a given text. These test cases thoroughly check how the function behaves in various scenarios. ### Test Cases 1. **Single Pattern Occurrence** (`test_single_pattern_occurrence`): - Description: Tests the removal of a single pattern occurrence from the text. - Input: Text: "abc", Pattern: 'b' - Expected Output: "ac" 2. **Repeat Test Single Pattern Occurrence** (`repeat_test_single_pattern_occurrence`): - Description: Repeats the previous test case to ensure consecutive removal of the same pattern. - Input: Text: "ac", Pattern: 'b' - Expected Output: "ac" 3. **Single Character Pattern** (`test_single_character_pattern`): - Description: Tests the removal of a single character pattern. - Input: Text: "abcb", Pattern: 'b' - Expected Output: "ac" 4. **Pattern with Special Characters** (`test_pattern_with_special_characters`): - Description: Tests the removal of a pattern containing special characters. - Input: Text: "ศไทย中华Việt Nam; foobarศ", Pattern: 'ศ' - Expected Output: "ไทย中华Việt Nam; foobar" 5. **Pattern Empty Text and Pattern** (`test_pattern_empty_text_and_pattern`): - Description: Tests the removal of an empty pattern from an empty text. - Input: Text: "", Pattern: "" - Expected Output: "" 6. **Pattern Empty Text** (`test_pattern_empty_text`): - Description: Tests the removal of a pattern from an empty text. - Input: Text: "", Pattern: "something" - Expected Output: "" 7. **Empty Pattern** (`test_empty_pattern`): - Description: Tests the behavior of removing an empty pattern from the text. - Input: Text: "Testing with empty pattern.", Pattern: "" - Expected Output: "Testing with empty pattern." 8. **Multiple Consecutive Patterns 1** (`test_multiple_consecutive_patterns_1`): - Description: Tests the removal of multiple consecutive occurrences of a pattern. - Input: Text: "aaaaa", Pattern: 'a' - Expected Output: "" 9. **Multiple Consecutive Patterns 2** (`test_multiple_consecutive_patterns_2`): - Description: Tests the removal of a longer pattern that occurs consecutively. - Input: Text: "Hello **world****today!**", Pattern: "**" - Expected Output: "Hello worldtoday!" 10. **Case Insensitive Pattern** (`test_case_insensitive_pattern`): - Description: Tests the removal of a case-insensitive pattern from the text. - Input: Text: "CASE ** SeNsItIvE ** PaTtErN.", Pattern: "sEnSiTiVe" - Expected Output: "CASE ** SeNsItIvE ** PaTtErN."
Closes #50206.
I lifted the function help from @frewsxcv's original PR (#50015), hope they don't mind.
I'm also wondering whether it would be useful for
remove_matches
to collect up the removed substrings into aVec
and return them, right now they're just overwritten by the copy and lost.