Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add impl From<CompactString> for String #118

Merged
merged 4 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions compact_str/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,12 @@ impl From<Box<str>> for CompactString {
}
}

impl From<CompactString> for String {
fn from(s: CompactString) -> Self {
s.repr.into_string()
}
}

impl FromStr for CompactString {
type Err = core::convert::Infallible;
fn from_str(s: &str) -> Result<CompactString, Self::Err> {
Expand Down
14 changes: 14 additions & 0 deletions compact_str/src/repr/boxed/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,20 @@ impl BoxString {
}
}

#[inline]
pub fn into_string(self) -> String {
// Ensure that we don't free the data we are transferring to `String`
let this = ManuallyDrop::new(self);

// SAFETY:
// * The memory in `ptr` was previously allocated by the same allocator the standard library
// uses, with a required alignment of exactly 1.
// * `length` is less than or equal to capacity, due to internal invaraints.
// * `capacity` is correctly maintained internally.
// * `BoxString` only ever contains valid UTF-8.
unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, this.capacity()) }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A BoxedString can store the capacity on either the stack or the heap, we store the capacity on the heap when there isn't enough room on the stack, see this doc comment for an explanation.

tl;dr I think you'll have to change this to something like:

match self.cap.as_usize() {
  // capacity is on the stack, form a string from these parts
  Ok(capacity) => {
    let this = ManuallyDrop::new(self);
    unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, capacity) }
  },
  // capacity is on the heap! we need to form a new String
  Err(_) => {
    // ... drop the existing string and form a new one
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do something cute here to avoid reallocating by just copying down over the top of the boxed capacity, but it didn't do what I intended under test. My guess is that the layouts aren't compatible. I've included the code snippet below in case you know a bit more than me and want to iterate on it. For now, I've just set it to copy out a new string.

let capacity = this.capacity();
let cap_bytes = std::mem::size_of::<usize>();
// SAFETY: `this.ptr` should be a valid value if the capacity is on the heap,
// as should the address `size_of::<usize>()` bytes below that address.
let init = unsafe { this.ptr.as_ptr().sub(cap_bytes) };
// SAFETY: moving the bytes within the existing allocation
// * `init` is below `ptr`, and `len` bytes are already within the existing allocation.
unsafe {
    std::ptr::copy(this.ptr.as_ptr(), init, this.len);
}
// SAFETY:
// * The memory in `init` was previously allocated by the same allocator the standard library
//   uses, with a required alignment of exactly 1.
// * `length` is less than or equal to capacity, due to internal invaraints.
// * `capacity` is correctly maintained internally, and we recovered `cap_bytes` in the copy.
// * `BoxString` only ever contains valid UTF-8.
unsafe { String::from_raw_parts(init, this.len, capacity + cap_bytes) }

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to re-use the existing buffer. I think this is something we can iterate on, for now I filed issue #120 to track this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the layouts aren't compatible

Yeah, you'll run into an issue as String's buffer is u8 aligned but the inline length is usize aligned. At a performance cost, we could store the length unaligned to make the allocated alignment 1.

The unstable Allocator trait allows requesting (though an implementation can theoretically refuse to fulfill) a change in alignment of an allocation. The stable GlobalAlloc does not.

My vote is to just copy/realloc, and put a note tl if/when Allocator is stabilized optimize it to opportunistically use a layout-adjusting realloc so the underlying allocator can avoid the copy.

}

#[inline]
pub fn from_box_str(b: Box<str>) -> Self {
match Capacity::new(b.len()) {
Expand Down
5 changes: 5 additions & 0 deletions compact_str/src/repr/heap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ impl HeapString {
HeapString { string }
}

#[inline]
pub fn into_string(self) -> String {
self.string.into_string()
}

#[inline]
pub fn from_box_str(b: Box<str>) -> Self {
let string = BoxString::from_box_str(b);
Expand Down
42 changes: 42 additions & 0 deletions compact_str/src/repr/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,22 @@ impl Repr {
}
}

#[inline]
pub fn into_string(self) -> String {
if self.capacity() == 0 {
String::new()
} else {
match self.cast_into() {
StrongIntoRepr::Inline(inline) => String::from(inline.as_str()),
StrongIntoRepr::Heap(heap) => {
// `HeapString::into_string()` takes ownership and
// is responsible for avoiding a double-free.
ManuallyDrop::into_inner(heap).into_string()
}
}
}
}

#[inline]
pub fn from_box_str(b: Box<str>) -> Self {
if b.len() == 0 {
Expand Down Expand Up @@ -272,6 +288,20 @@ impl Repr {
}
}

#[inline(always)]
fn cast_into(self) -> StrongIntoRepr {
match self.discriminant() {
Discriminant::Heap => {
// SAFETY: We checked the discriminant to make sure the union is `heap`
StrongIntoRepr::Heap(unsafe { self.into_union().heap })
}
Discriminant::Inline => {
// SAFETY: We checked the discriminant to make sure the union is `inline`
StrongIntoRepr::Inline(unsafe { self.into_union().inline })
}
}
}

#[inline(always)]
const fn from_inline(repr: InlineString) -> Self {
// SAFETY: An `InlineString` and `Repr` have the same size
Expand All @@ -295,6 +325,12 @@ impl Repr {
// SAFETY: An `ReprUnion` and `Repr` have the same size
unsafe { &mut *(self as *mut _ as *mut _) }
}

#[inline(always)]
fn into_union(self) -> ReprUnion {
// SAFETY: An `ReprUnion` and `Repr` have the same size
unsafe { std::mem::transmute(self) }
}
}

impl Clone for Repr {
Expand Down Expand Up @@ -501,6 +537,12 @@ impl<'a> MutStrongRepr<'a> {
}
}

#[derive(Debug)]
enum StrongIntoRepr {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe StrongOwnedRepr? makes it a little more general in case we use this for more than just into_string() in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Inline(InlineString),
Heap(ManuallyDrop<HeapString>),
}

crate::asserts::assert_size_eq!(ReprUnion, Repr, Option<Repr>, String, Option<String>);

#[cfg(target_pointer_width = "64")]
Expand Down
138 changes: 138 additions & 0 deletions compact_str/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,16 @@ fn proptest_reserve_and_write_bytes_allocated_properly(#[strategy(rand_unicode()
prop_assert_eq!(compact.is_heap_allocated(), word.len() > MAX_SIZE);
}

#[proptest]
#[cfg_attr(miri, ignore)]
fn proptest_arbitrary_compact_string_converts_to_string(#[strategy(rand_unicode())] word: String) {
let compact = CompactString::new(&word);
let result = String::from(compact);

prop_assert_eq!(result.len(), word.len());
prop_assert_eq!(result, word);
}

#[proptest]
#[cfg_attr(miri, ignore)]
fn proptest_extend_chars_allocated_properly(
Expand Down Expand Up @@ -695,3 +705,131 @@ fn test_to_compact_string() {
format_compact!("0{}67890123456789{}", "12345", 999999)
);
}

#[test]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the existing test coverage, thank you!

Could you add one more test for the 16MB edge case that exists for 32-bit machines? see here for an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a relevant test with a boxed capacity.

fn test_into_string_large_string_with_excess_capacity() {
let mut string = String::with_capacity(128);
string.push_str("abcdefghijklmnopqrstuvwxyz");
let str_addr = string.as_ptr();
let str_len = string.len();
let str_cap = string.capacity();

let compact = CompactString::from(string);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

assert_eq!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_cap, new_str_cap);
}

#[test]
fn test_into_string_small_string_with_excess_capacity() {
let mut string = String::with_capacity(128);
string.push_str("abcdef");
let str_addr = string.as_ptr();
let str_len = string.len();
let str_cap = string.capacity();

let compact = CompactString::from(string);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

// If small boxed strings are eagerly compacted, the address and capacity assertions won't hold.
// Compaction is not eager, so these should hold.
assert_eq!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_cap, new_str_cap);
}

#[test]
fn test_into_string_small_string_with_no_excess_capacity() {
let string = String::from("abcdef");
let str_addr = string.as_ptr();
let str_len = string.len();
let str_cap = string.capacity();

let compact = CompactString::from(string);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

// If small boxed strings are eagerly compacted, the address assertion won't hold.
// Compaction is not eager, so these should hold.
assert_eq!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_cap, new_str_cap);
}

#[test]
fn test_into_string_empty_string() {
let string = String::new();
let str_addr = string.as_ptr();
let str_len = string.len();
let str_cap = string.capacity();

let compact = CompactString::from(string);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

assert_eq!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_cap, new_str_cap);
}

#[test]
fn test_into_string_small_str() {
let data = "abcdef";
let str_addr = data.as_ptr();
let str_len = data.len();

let compact = CompactString::from(data);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

assert_ne!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_len, new_str_cap);
}

#[test]
fn test_into_string_long_str() {
let data = "abcdefghijklmnopqrstuvwxyz";
let str_addr = data.as_ptr();
let str_len = data.len();

let compact = CompactString::from(data);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

assert_ne!(str_addr, new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_len, new_str_cap);
}

#[test]
fn test_into_string_empty_str() {
let data = "";
let str_len = data.len();

let compact = CompactString::from(data);
let new_string = String::from(compact);
let new_str_addr = new_string.as_ptr();
let new_str_len = new_string.len();
let new_str_cap = new_string.capacity();

assert_eq!(String::new().as_ptr(), new_str_addr);
assert_eq!(str_len, new_str_len);
assert_eq!(str_len, new_str_cap);
}