Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add impl From<CompactString> for String #118

Merged
merged 4 commits into from
Jun 21, 2022

Conversation

neoeinstein
Copy link
Contributor

Tests added and run under miri. Ensures that no reallocation/copy of already-boxed strings happens in a round-trip, with the address, length, and capacity remaining stable.

Closes #117

Copy link
Owner

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, looking real good! Just some small changes necessary to handle an edge case with 32-bit arch's

// * `length` is less than or equal to capacity, due to internal invaraints.
// * `capacity` is correctly maintained internally.
// * `BoxString` only ever contains valid UTF-8.
unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, this.capacity()) }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A BoxedString can store the capacity on either the stack or the heap, we store the capacity on the heap when there isn't enough room on the stack, see this doc comment for an explanation.

tl;dr I think you'll have to change this to something like:

match self.cap.as_usize() {
  // capacity is on the stack, form a string from these parts
  Ok(capacity) => {
    let this = ManuallyDrop::new(self);
    unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, capacity) }
  },
  // capacity is on the heap! we need to form a new String
  Err(_) => {
    // ... drop the existing string and form a new one
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do something cute here to avoid reallocating by just copying down over the top of the boxed capacity, but it didn't do what I intended under test. My guess is that the layouts aren't compatible. I've included the code snippet below in case you know a bit more than me and want to iterate on it. For now, I've just set it to copy out a new string.

let capacity = this.capacity();
let cap_bytes = std::mem::size_of::<usize>();
// SAFETY: `this.ptr` should be a valid value if the capacity is on the heap,
// as should the address `size_of::<usize>()` bytes below that address.
let init = unsafe { this.ptr.as_ptr().sub(cap_bytes) };
// SAFETY: moving the bytes within the existing allocation
// * `init` is below `ptr`, and `len` bytes are already within the existing allocation.
unsafe {
    std::ptr::copy(this.ptr.as_ptr(), init, this.len);
}
// SAFETY:
// * The memory in `init` was previously allocated by the same allocator the standard library
//   uses, with a required alignment of exactly 1.
// * `length` is less than or equal to capacity, due to internal invaraints.
// * `capacity` is correctly maintained internally, and we recovered `cap_bytes` in the copy.
// * `BoxString` only ever contains valid UTF-8.
unsafe { String::from_raw_parts(init, this.len, capacity + cap_bytes) }

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to re-use the existing buffer. I think this is something we can iterate on, for now I filed issue #120 to track this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the layouts aren't compatible

Yeah, you'll run into an issue as String's buffer is u8 aligned but the inline length is usize aligned. At a performance cost, we could store the length unaligned to make the allocated alignment 1.

The unstable Allocator trait allows requesting (though an implementation can theoretically refuse to fulfill) a change in alignment of an allocation. The stable GlobalAlloc does not.

My vote is to just copy/realloc, and put a note tl if/when Allocator is stabilized optimize it to opportunistically use a layout-adjusting realloc so the underlying allocator can avoid the copy.

@@ -501,6 +537,12 @@ impl<'a> MutStrongRepr<'a> {
}
}

#[derive(Debug)]
enum StrongIntoRepr {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe StrongOwnedRepr? makes it a little more general in case we use this for more than just into_string() in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -695,3 +705,131 @@ fn test_to_compact_string() {
format_compact!("0{}67890123456789{}", "12345", 999999)
);
}

#[test]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the existing test coverage, thank you!

Could you add one more test for the 16MB edge case that exists for 32-bit machines? see here for an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a relevant test with a boxed capacity.

By using `String::from_utf8_unchecked` we significantly speed up miri test
runs. This is because miri doesn't need to spend a ton of time verifying
that every character in the ~16MiB vector is UTF-8, when we already know
that it is, since it's always just ASCII `A`s.
Comment on lines +616 to +618
// SAFETY: We've just created the vector above, and it contains only ASCII characters.
// This speeds up `miri` runs significantly.
let string = unsafe { String::from_utf8_unchecked(word_buf) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

Copy link
Owner

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Thanks for the PR and speeding up some of our tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

impl From<CompactString> for String
3 participants