Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Update From<String> and From<Box<str>> to eagerly inline #256

Merged
merged 3 commits into from
Feb 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions compact_str/src/features/proptest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ mod test {

use crate::CompactString;

const MAX_SIZE: usize = std::mem::size_of::<String>();

proptest! {
#[test]
#[cfg_attr(miri, ignore)]
Expand All @@ -38,12 +40,12 @@ mod test {
}

/// We rely on [`proptest`]'s `String` strategy for generating a `CompactString`. When
/// converting from a `String` into a `CompactString`, our O(1) converstion kicks in and we
/// reuse the buffer, unless empty, and thus all non-empty strings will be heap allocated
/// converting from a `String` into a `CompactString`, if it's short enough we should
/// eagerly inline strings
#[test]
#[cfg_attr(miri, ignore)]
fn proptest_does_not_inline_strings(compact: CompactString) {
if compact.is_empty() {
if compact.len() <= MAX_SIZE {
assert!(!compact.is_heap_allocated());
} else {
assert!(compact.is_heap_allocated());
Expand Down
131 changes: 129 additions & 2 deletions compact_str/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,55 @@ mod tests;
/// assert_eq!(CompactString::new("chicago"), "chicago");
/// assert_eq!(CompactString::new("houston"), String::from("houston"));
/// ```
///
/// # Converting from a `String`
/// It's important that a `CompactString` interops well with `String`, so you can easily use both in
/// your code base.
///
/// `CompactString` implements `From<String>` and operates in the following manner:
/// - Eagerly inlines the string, possibly dropping excess capacity
/// - Otherwise re-uses the same underlying buffer from `String`
///
/// ```
/// use compact_str::CompactString;
///
/// // eagerly inlining
/// let short = String::from("hello world");
/// let short_c = CompactString::from(short);
/// assert!(!short_c.is_heap_allocated());
///
/// // dropping excess capacity
/// let mut excess = String::with_capacity(256);
/// excess.push_str("abc");
///
/// let excess_c = CompactString::from(excess);
/// assert!(!excess_c.is_heap_allocated());
/// assert!(excess_c.capacity() < 256);
///
/// // re-using the same buffer
/// let long = String::from("this is a longer string that will be heap allocated");
///
/// let long_ptr = long.as_ptr();
/// let long_len = long.len();
/// let long_cap = long.capacity();
///
/// let mut long_c = CompactString::from(long);
/// assert!(long_c.is_heap_allocated());
///
/// let cpt_ptr = long_c.as_ptr();
/// let cpt_len = long_c.len();
/// let cpt_cap = long_c.capacity();
///
/// // the original String and the CompactString point to the same place in memory, buffer re-use!
/// assert_eq!(cpt_ptr, long_ptr);
/// assert_eq!(cpt_len, long_len);
/// assert_eq!(cpt_cap, long_cap);
/// ```
///
/// ### Prevent Eagerly Inlining
/// A consequence of eagerly inlining is you then need to de-allocate the existing buffer, which
/// might not always be desirable if you're converting a very large amount of `String`s. If your
/// code is very sensitive to allocations, consider the [`CompactString::from_string_buffer`] API.
#[derive(Clone)]
#[repr(transparent)]
pub struct CompactString(Repr);
Expand Down Expand Up @@ -1379,6 +1428,82 @@ impl CompactString {
pub fn into_string(self) -> String {
self.0.into_string()
}

/// Convert a [`String`] into a [`CompactString`] _without inlining_.
///
/// Note: You probably don't need to use this method, instead you should use `From<String>`
/// which is implemented for [`CompactString`].
///
/// This method exists incase your code is very sensitive to memory allocations. Normally when
/// converting a [`String`] to a [`CompactString`] we'll inline short strings onto the stack.
/// But this results in [`Drop`]-ing the original [`String`], which causes memory it owned on
/// the heap to be deallocated. Instead when using this method, we always reuse the buffer that
/// was previously owned by the [`String`], so no trips to the allocator are needed.
///
/// # Examples
///
/// ### Short Strings
/// ```
/// use compact_str::CompactString;
///
/// let short = "hello world".to_string();
/// let c_heap = CompactString::from_string_buffer(short);
///
/// // using CompactString::from_string_buffer, we'll re-use the String's underlying buffer
/// assert!(c_heap.is_heap_allocated());
///
/// // note: when Clone-ing a short heap allocated string, we'll eagerly inline at that point
/// let c_inline = c_heap.clone();
/// assert!(!c_inline.is_heap_allocated());
///
/// assert_eq!(c_heap, c_inline);
/// ```
///
/// ### Longer Strings
/// ```
/// use compact_str::CompactString;
///
/// let x = "longer string that will be on the heap".to_string();
/// let c1 = CompactString::from(x);
///
/// let y = "longer string that will be on the heap".to_string();
/// let c2 = CompactString::from_string_buffer(y);
///
/// // for longer strings, we re-use the underlying String's buffer in both cases
/// assert!(c1.is_heap_allocated());
/// assert!(c2.is_heap_allocated());
/// ```
///
/// ### Buffer Re-use
/// ```
/// use compact_str::CompactString;
///
/// let og = "hello world".to_string();
/// let og_addr = og.as_ptr();
///
/// let mut c = CompactString::from_string_buffer(og);
/// let ex_addr = c.as_ptr();
///
/// // When converting to/from String and CompactString with from_string_buffer we always re-use
/// // the same underlying allocated memory/buffer
/// assert_eq!(og_addr, ex_addr);
///
/// let long = "this is a long string that will be on the heap".to_string();
/// let long_addr = long.as_ptr();
///
/// let mut long_c = CompactString::from(long);
/// let long_ex_addr = long_c.as_ptr();
///
/// // When converting to/from String and CompactString with From<String>, we'll also re-use the
/// // underlying buffer, if the string is long, otherwise when converting to CompactString we
/// // eagerly inline
/// assert_eq!(long_addr, long_ex_addr);
/// ```
#[inline]
pub fn from_string_buffer(s: String) -> Self {
let repr = Repr::from_string(s, false);
CompactString(repr)
}
}

impl Default for CompactString {
Expand Down Expand Up @@ -1492,7 +1617,7 @@ impl<'a> From<&'a str> for CompactString {

impl From<String> for CompactString {
fn from(s: String) -> Self {
let repr = Repr::from_string(s);
let repr = Repr::from_string(s, true);
CompactString(repr)
}
}
Expand All @@ -1507,14 +1632,16 @@ impl<'a> From<Cow<'a, str>> for CompactString {
fn from(cow: Cow<'a, str>) -> Self {
match cow {
Cow::Borrowed(s) => s.into(),
// we separate these two so we can re-use the underlying buffer in the owned case
Cow::Owned(s) => s.into(),
}
}
}

impl From<Box<str>> for CompactString {
fn from(b: Box<str>) -> Self {
let repr = Repr::from_box_str(b);
let s = b.into_string();
let repr = Repr::from_string(s, true);
CompactString(repr)
}
}
Expand Down
6 changes: 3 additions & 3 deletions compact_str/src/repr/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ impl FromIterator<char> for Repr {
// If the size hint indicates we can't store this inline, then create a heap string
let (size_hint, _) = iter.size_hint();
if size_hint > MAX_SIZE {
return Repr::from_string(iter.collect());
return Repr::from_string(iter.collect(), true);
}

// Otherwise, continuously pull chars from the iterator
Expand All @@ -41,7 +41,7 @@ impl FromIterator<char> for Repr {
// extend heap with remaining characters
string.extend(iter);

return Repr::from_string(string);
return Repr::from_string(string, true);
}

// write the current char into a slice of the unoccupied space
Expand Down Expand Up @@ -91,7 +91,7 @@ where
// extend heap with remaining strings
string.extend(iter);

return Repr::from_string(string);
return Repr::from_string(string, true);
}

// write the current string into a slice of the unoccupied space
Expand Down
139 changes: 55 additions & 84 deletions compact_str/src/repr/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,12 +140,13 @@ impl Repr {
repr
}

/// Create a [`Repr`] from a [`String`], in `O(1)` time.
/// Create a [`Repr`] from a [`String`], in `O(1)` time. We'll attempt to inline the string
/// if `should_inline` is `true`
///
/// Note: If the provided [`String`] is >16 MB and we're on a 32-bit arch, we'll copy the
/// `String`.
#[inline]
pub fn from_string(s: String) -> Self {
pub fn from_string(s: String, should_inline: bool) -> Self {
let og_cap = s.capacity();
let cap = Capacity::new(og_cap);

Expand All @@ -168,6 +169,11 @@ impl Repr {
} else if og_cap == 0 {
// We don't expect converting from an empty String often, so we make this code path cold
empty()
} else if should_inline && s.len() <= MAX_SIZE {
// SAFETY: Checked to make sure the string would fit inline
let inline = unsafe { InlineBuffer::new(s.as_str()) };
// SAFETY: `InlineBuffer` and `Repr` are the same size
unsafe { mem::transmute(inline) }
} else {
let mut s = mem::ManuallyDrop::new(s.into_bytes());
let len = s.len();
Expand Down Expand Up @@ -231,47 +237,6 @@ impl Repr {
}
}

#[inline]
pub fn from_box_str(s: Box<str>) -> Self {
let og_cap = s.len();
let cap = Capacity::new(og_cap);

#[cold]
fn capacity_on_heap(s: Box<str>) -> Repr {
let heap = HeapBuffer::new(&s);
// SAFETY: `BoxString` and `Repr` are the same size
unsafe { mem::transmute(heap) }
}

#[cold]
fn empty() -> Repr {
EMPTY
}

if cap.is_heap() {
// We only hit this case if the provided String is > 16MB and we're on a 32-bit arch. We
// expect it to be unlikely, thus we hint that to the compiler
capacity_on_heap(s)
} else if og_cap == 0 {
// We don't expect converting from an empty String often, so we make this code path cold
empty()
} else {
// Don't drop the box here
let raw_ptr = Box::into_raw(s).cast::<u8>();
let ptr = ptr::NonNull::new(raw_ptr).expect("string with capacity has null ptr?");

// create a new BoxString with our parts!
let heap = HeapBuffer {
ptr,
len: og_cap,
cap,
};

// SAFETY: `BoxString` and `Repr` are the same size
unsafe { mem::transmute(heap) }
}
}

/// Reserves at least `additional` bytes. If there is already enough capacity to store
/// `additional` bytes this is a no-op
#[inline]
Expand Down Expand Up @@ -687,21 +652,59 @@ mod tests {
}
}

#[test_case(String::new(); "empty")]
#[test_case(String::from("nyc 🗽"); "short")]
#[test_case(String::from("this is a really long string, which is intended"); "long")]
fn test_from_string(s: String) {
let r = Repr::from_string(s.clone());
assert_eq!(r.len(), s.len());
assert_eq!(r.as_str(), s.as_str());
#[test_case(String::new(), true; "empty should inline")]
#[test_case(String::new(), false; "empty not inline")]
#[test_case(String::with_capacity(10), true ; "empty with small capacity inline")]
#[test_case(String::with_capacity(10), false ; "empty with small capacity not inline")]
#[test_case(String::with_capacity(128), true ; "empty with large capacity inline")]
#[test_case(String::with_capacity(128), false ; "empty with large capacity not inline")]
#[test_case(String::from("nyc 🗽"), true; "short should inline")]
#[test_case(String::from("nyc 🗽"), false ; "short not inline")]
#[test_case(String::from("this is a really long string, which is intended"), true; "long")]
#[test_case(String::from("this is a really long string, which is intended"), false; "long not inline")]
#[test_case(EIGHTEEN_MB_STR.to_string(), true ; "huge should inline")]
#[test_case(EIGHTEEN_MB_STR.to_string(), false ; "huge not inline")]
fn test_from_string(s: String, try_to_inline: bool) {
// note: when cloning a String it truncates capacity, which is why we measure these values
// before cloning the string
let s_len = s.len();
let s_cap = s.capacity();
let s_str = s.clone();

let r = Repr::from_string(s, try_to_inline);

assert_eq!(r.len(), s_len);
assert_eq!(r.as_str(), s_str.as_str());

if s_cap == 0 {
// we should always inline the string, if the length of the source string is 0
assert!(!r.is_heap_allocated());
} else if try_to_inline && s_len <= MAX_SIZE {
// we should inline the string, if we were asked to, and the length of the string would
// fit inline, meaning we would truncate capacity
assert!(!r.is_heap_allocated());
} else {
assert!(r.is_heap_allocated());
}
}

#[quickcheck]
#[cfg_attr(miri, ignore)]
fn quickcheck_from_string(s: String) {
let r = Repr::from_string(s.clone());
fn quickcheck_from_string(s: String, try_to_inline: bool) {
let r = Repr::from_string(s.clone(), try_to_inline);

assert_eq!(r.len(), s.len());
assert_eq!(r.as_str(), s.as_str());

if s.capacity() == 0 {
// we should always inline the string, if the length of the source string is 0
assert!(!r.is_heap_allocated());
} else if s.capacity() <= MAX_SIZE {
// we should inline the string, if we were asked to
assert_eq!(!r.is_heap_allocated(), try_to_inline);
} else {
assert!(r.is_heap_allocated());
}
}

#[test_case(""; "empty")]
Expand Down Expand Up @@ -832,36 +835,4 @@ mod tests {
assert_eq!(r_a.capacity(), r_b.capacity());
assert_eq!(r_a.is_heap_allocated(), r_b.is_heap_allocated());
}

#[test_case("q"; "single")]
#[test_case("abc"; "short")]
#[test_case("this is (another) long string that will be heap allocated"; "long")]
#[test_case(EIGHTEEN_MB_STR; "huge")]
fn test_from_box_str(initial: &'static str) {
let box_str = String::from(initial).into_boxed_str();

let r = Repr::from_box_str(box_str);

assert_eq!(r.as_str(), initial);
assert_eq!(r.len(), initial.len());
assert_eq!(r.capacity(), initial.len());

// when converting from a Box<str> we do not automatically inline the string
assert!(r.is_heap_allocated());
}

#[test]
fn test_from_box_str_empty() {
let box_str = String::from("").into_boxed_str();

let r = Repr::from_box_str(box_str);

assert_eq!(r.as_str(), "");
assert_eq!(r.len(), 0);

// when converting from a Box<str> we do not automatically inline the string, unless the
// Box<str> is empty, then we return an empty inlined string
assert_eq!(r.capacity(), MAX_SIZE);
assert!(!r.is_heap_allocated());
}
}
Loading