Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vec::splice and String::splice #32355

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions src/libcollections/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -978,13 +978,9 @@ impl<T: Clone> ToOwned for [T] {
self.to_vec()
}

// HACK(japaric): with cfg(test) the inherent `[T]::to_vec`, which is required for this method
// definition, is not available. Since we don't require this method for testing purposes, I'll
// just stub it
// NB see the slice::hack module in slice.rs for more information
#[cfg(test)]
fn to_owned(&self) -> Vec<T> {
panic!("not available with cfg(test)")
hack::to_vec(self)
}
}

Expand Down
130 changes: 129 additions & 1 deletion src/libcollections/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1228,7 +1228,7 @@ impl String {
self.vec.clear()
}

/// Create a draining iterator that removes the specified range in the string
/// Creates a draining iterator that removes the specified range in the string
/// and yields the removed chars.
///
/// Note: The element range is removed even if the iterator is not
Expand Down Expand Up @@ -1286,6 +1286,63 @@ impl String {
}
}

/// Creates a splicing iterator that removes the specified range in the string,
/// replaces with the given string, and yields the removed chars.
/// The given string doesn’t need to be the same length as the range.
///
/// Note: The element range is removed even if the iterator is not
/// consumed until the end.
///
/// # Panics
///
/// Panics if the starting point or end point do not lie on a [`char`]
/// boundary, or if they're out of bounds.
///
/// [`char`]: ../../std/primitive.char.html
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// #![feature(splice)]
/// let mut s = String::from("α is alpha, β is beta");
/// let beta_offset = s.find('β').unwrap_or(s.len());
///
/// // Replace the range up until the β from the string
/// let t: String = s.splice(..beta_offset, "Α is capital alpha; ").collect();
/// assert_eq!(t, "α is alpha, ");
/// assert_eq!(s, "Α is capital alpha; β is beta");
/// ```
#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
pub fn splice<'a, 'b, R>(&'a mut self, range: R, replace_with: &'b str) -> Splice<'a, 'b>
where R: RangeArgument<usize>
{
// Memory safety
//
// The String version of Splice does not have the memory safety issues
// of the vector version. The data is just plain bytes.
// Because the range removal happens in Drop, if the Splice iterator is leaked,
// the removal will not happen.
let len = self.len();
let start = *range.start().unwrap_or(&0);
let end = *range.end().unwrap_or(&len);

// Take out two simultaneous borrows. The &mut String won't be accessed
// until iteration is over, in Drop.
let self_ptr = self as *mut _;
// slicing does the appropriate bounds checks
let chars_iter = self[start..end].chars();

Splice {
start: start,
end: end,
iter: chars_iter,
string: self_ptr,
replace_with: replace_with
}
}

/// Converts this `String` into a `Box<str>`.
///
/// This will drop any excess capacity.
Expand Down Expand Up @@ -1884,3 +1941,74 @@ impl<'a> DoubleEndedIterator for Drain<'a> {
self.iter.next_back()
}
}

/// A splicing iterator for `String`.
///
/// This struct is created by the [`splice()`] method on [`String`]. See its
/// documentation for more.
///
/// [`splice()`]: struct.String.html#method.splice
/// [`String`]: struct.String.html
#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
pub struct Splice<'a, 'b> {
/// Will be used as &'a mut String in the destructor
string: *mut String,
/// Start of part to remove
start: usize,
/// End of part to remove
end: usize,
/// Current remaining range to remove
iter: Chars<'a>,
replace_with: &'b str,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a PhantomData here with &'a mut String as it's essentially what string is a reference to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, but would it affect anything? If it’s necessary, isn’t it also necessary in struct Drain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes it might be, I believe it has implications on the variance of the lifetime parameters of the struct (e.g. invariant, not covariant or whichever it is)


#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
unsafe impl<'a, 'b> Sync for Splice<'a, 'b> {}
#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
unsafe impl<'a, 'b> Send for Splice<'a, 'b> {}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, 'b> Drop for Splice<'a, 'b> {
fn drop(&mut self) {
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty similar to the Vec implementation, is there any way those two could share an implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s what I wanted to do at first, but I saw that String::drain is not based on Vec::drain and has this comment:

        // Memory safety
        //
        // The String version of Drain does not have the memory safety issues
        // of the vector version. The data is just plain bytes.
        // Because the range removal happens in Drop, if the Drain iterator is leaked,
        // the removal will not happen.

So I kept the same pattern.

u8 being Copy and str::len being certain (compared to Iterator::size_hint) make this algorithm simpler than the fully generic case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok, I just found this pretty gnarly to read and it seems like something we may not want to jump to specializing just yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another detail is that String::splice returns an Iterator<Item=char>, which is not easy to implement on top of Iterator<Item=u8> returned by splice on the underlying bytes vector.

But that iterator is not as useful as on Vec<T> where T might not be Copy. You can always access string[range] before calling string.splice(range, replacement). (And maybe even skip the UTF-8 round-trip if you end up using .to_owned() on that slice instead of .collect() on a char iterator.) So I wouldn’t mind having String::splice return nothing, in which case it is trivial to implement with Vec::splice. This even is how I wrote the RFC since I’d assumed (and didn’t check) that String::drain also returns nothing.

let vec = (*self.string).as_mut_vec();
let range_len = self.end - self.start;
let replacement_len = self.replace_with.len();
let tail_len = vec.len() - self.end;
if replacement_len > range_len {
vec.reserve(replacement_len - range_len);
}
if replacement_len != range_len {
let src = vec.as_ptr().offset(self.end as isize);
let dst = vec.as_mut_ptr().offset((self.start + replacement_len) as isize);
ptr::copy(src, dst, tail_len);
}
let src = self.replace_with.as_ptr();
let dst = vec.as_mut_ptr().offset(self.start as isize);
ptr::copy(src, dst, replacement_len);
vec.set_len(self.start + replacement_len + tail_len);
}
}
}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, 'b> Iterator for Splice<'a, 'b> {
type Item = char;

#[inline]
fn next(&mut self) -> Option<char> {
self.iter.next()
}

fn size_hint(&self) -> (usize, Option<usize>) {
self.iter.size_hint()
}
}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, 'b> DoubleEndedIterator for Splice<'a, 'b> {
#[inline]
fn next_back(&mut self) -> Option<char> {
self.iter.next_back()
}
}
166 changes: 163 additions & 3 deletions src/libcollections/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -764,7 +764,7 @@ impl<T> Vec<T> {
}
}

/// Create a draining iterator that removes the specified range in the vector
/// Creates a draining iterator that removes the specified range in the vector
/// and yields the removed items.
///
/// Note 1: The element range is removed even if the iterator is not
Expand Down Expand Up @@ -826,6 +826,53 @@ impl<T> Vec<T> {
}
}

/// Creates a splicing iterator that replaces the specified range in the vector
/// with the given `replace_with` iterator and yields the removed items.
/// `replace_with` does not need to be the same length as `range`.
///
/// Note 1: The element range is removed even if the iterator is not
/// consumed until the end.
///
/// Note 2: It is unspecified how many elements are removed from the vector,
/// if the `Splice` value is leaked.
///
/// Note 3: The input iterator `replace_with` is only consumed
/// when the `Splice` value is dropped.
///
/// Note 4: This is optimal if:
///
/// * The tail (elements in the vector after `range`) is empty,
/// * or `replace_with` yields fewer elements than `range`’s length
/// * or the lower bound of its `size_hint()` is exact.
///
/// Otherwise, a temporary vector is allocated and the tail is moved twice.
///
/// # Panics
///
/// Panics if the starting point is greater than the end point or if
/// the end point is greater than the length of the vector.
///
/// # Examples
///
/// ```
/// #![feature(splice)]
/// let mut v = vec![1, 2, 3];
/// let new = [7, 8];
/// let u: Vec<_> = v.splice(..2, new.iter().cloned()).collect();
/// assert_eq!(v, &[7, 8, 3]);
/// assert_eq!(u, &[1, 2]);
/// ```
#[inline]
#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
pub fn splice<R, I>(&mut self, range: R, replace_with: I) -> Splice<I::IntoIter>
where R: RangeArgument<usize>, I: IntoIterator<Item=T>
{
Splice {
drain: self.drain(range),
replace_with: replace_with.into_iter(),
}
}

/// Clears the vector, removing all values.
///
/// # Examples
Expand Down Expand Up @@ -1696,7 +1743,7 @@ impl<T> Drop for IntoIter<T> {
}
}

/// A draining iterator for `Vec<T>`.
/// A draining iterator for `Vec<T>`. See the [`Vec::drain`](struct.Vec.html#method.drain) method.
#[stable(feature = "drain", since = "1.6.0")]
pub struct Drain<'a, T: 'a> {
/// Index of tail to preserve
Expand Down Expand Up @@ -1756,6 +1803,119 @@ impl<'a, T> Drop for Drain<'a, T> {
}
}


#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T> ExactSizeIterator for Drain<'a, T> {}

/// A splicing iterator for `Vec<T>`. See the [`Vec::splice`](struct.Vec.html#method.splice) method.
#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
pub struct Splice<'a, I: Iterator + 'a> {
drain: Drain<'a, I::Item>,
replace_with: I,
}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, I: Iterator> Iterator for Splice<'a, I> {
type Item = I::Item;

fn next(&mut self) -> Option<Self::Item> {
self.drain.next()
}

fn size_hint(&self) -> (usize, Option<usize>) {
self.drain.size_hint()
}
}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, I: Iterator> DoubleEndedIterator for Splice<'a, I> {
fn next_back(&mut self) -> Option<Self::Item> {
self.drain.next_back()
}
}

#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, I: Iterator> ExactSizeIterator for Splice<'a, I> {}


#[unstable(feature = "splice", reason = "recently added", issue = "32310")]
impl<'a, I: Iterator> Drop for Splice<'a, I> {
fn drop(&mut self) {
// exhaust drain first
while let Some(_) = self.drain.next() {}


unsafe {
if self.drain.tail_len == 0 {
let vec = &mut *self.drain.vec;
vec.extend(self.replace_with.by_ref());
return
}

// First fill the range left by drain().
if !self.drain.fill(&mut self.replace_with) {
return
}

// There may be more elements. Use the lower bound as an estimate.
// FIXME: Is the upper bound a better guess? Or something else?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like some other opinions here. Getting this guess wrong either way causes the tail to be moved twice.

  • If we over-estimate (which it the moment only happens if size_hint is incorrect) we reserve more space than necessary in the vector. But that space can still be used later (with further calls to Vec::push and friends). Vec::reserve often over-allocates anyway.
  • If we under-estimate, we allocate a temporary vector and soon drop it.

Maybe over-estimating is better, and this should use the upper bound when it’s Some? (And fall back to the lower bound otherwise.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if the optimistic route is only used if Some(lower) == upper, i.e the iterator claims to know its exact size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Some(lower) == upper we already use that. If they’re accurate the Vec we then collect is zero-length. The question is what to do when the two bounds are not equal?

let (lower_bound, _upper_bound) = self.replace_with.size_hint();
if lower_bound > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition lower_bound > 0 could be replaced by lower_bound > 0 && Some(lower_bound) == _upper_bound. Just to clarify what I mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. In that case we’d avoid moving "tail" elements twice, but we’d allocate a temporary Vec in more cases.

self.drain.move_tail(lower_bound);
if !self.drain.fill(&mut self.replace_with) {
return
}
}

// Collect any remaining elements.
// This is a zero-length vector which does not allocate if `lower_bound` was exact.
let mut collected = self.replace_with.by_ref().collect::<Vec<I::Item>>().into_iter();
// Now we have an exact count.
if collected.len() > 0 {
self.drain.move_tail(collected.len());
let filled = self.drain.fill(&mut collected);
debug_assert!(filled);
debug_assert_eq!(collected.len(), 0);
}
}
// Let `Drain::drop` move the tail back if necessary and restore `vec.len`.
}
}

/// Private helper methods for `Splice::drop`
impl<'a, T> Drain<'a, T> {
/// The range from `self.vec.len` to `self.tail_start` contains elements
/// that have been moved out.
/// Fill that range as much as possible with new elements from the `replace_with` iterator.
/// Return whether we filled the entire range. (`replace_with.next()` didn’t return `None`.)
unsafe fn fill<I: Iterator<Item=T>>(&mut self, replace_with: &mut I) -> bool {
let vec = &mut *self.vec;
let range_start = vec.len;
let range_end = self.tail_start;
let range_slice = slice::from_raw_parts_mut(
vec.as_mut_ptr().offset(range_start as isize),
range_end - range_start);

for place in range_slice {
if let Some(new_item) = replace_with.next() {
ptr::write(place, new_item);
vec.len += 1;
} else {
return false
}
}
true
}

/// Make room for inserting more elements before the tail.
unsafe fn move_tail(&mut self, extra_capacity: usize) {
let vec = &mut *self.vec;
let used_capacity = self.tail_start + self.tail_len;
vec.buf.reserve(used_capacity, extra_capacity);

let new_tail_start = self.tail_start + extra_capacity;
let src = vec.as_ptr().offset(self.tail_start as isize);
let dst = vec.as_mut_ptr().offset(new_tail_start as isize);
ptr::copy(src, dst, self.tail_len);
self.tail_start = new_tail_start;
}
}
1 change: 1 addition & 0 deletions src/libcollectionstest/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#![feature(pattern)]
#![feature(rand)]
#![feature(set_recovery)]
#![feature(splice)]
#![feature(step_by)]
#![feature(str_char)]
#![feature(str_escape)]
Expand Down
8 changes: 8 additions & 0 deletions src/libcollectionstest/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,14 @@ fn test_drain() {
assert_eq!(t, "");
}

#[test]
fn test_splice() {
let mut s = "Hello, world!".to_owned();
let t: String = s.splice(7..12, "世界").collect();
assert_eq!(s, "Hello, 世界!");
assert_eq!(t, "world");
}

#[test]
fn test_extend_ref() {
let mut a = "foo".to_string();
Expand Down
10 changes: 10 additions & 0 deletions src/libcollectionstest/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,16 @@ fn test_drain_range() {
assert_eq!(v, &[(), ()]);
}

#[test]
fn splice() {
let mut v = vec![1, 2, 3, 4, 5];
let a = [10, 11, 12];
v.splice(2..4, a.iter().cloned());
assert_eq!(v, &[1, 2, 10, 11, 12, 5]);
v.splice(1..3, Some(20));
assert_eq!(v, &[1, 20, 11, 12, 5]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some pretty small tests for a pretty meaty unsafe implementation, can you add some more "flavorful" tests along the lines of:

  • Returning some surprising size_hint values from the "to replace" iterator
  • Forgetting the splice iterator
  • Panicking during iteration of the splice iterator
  • Panicking while draining elements

etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ll add some later.


#[test]
fn test_into_boxed_slice() {
let xs = vec![1, 2, 3];
Expand Down