-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add impl From<CompactString> for String #118
Conversation
cebf1ef
to
831c2a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looking real good! Just some small changes necessary to handle an edge case with 32-bit arch's
compact_str/src/repr/boxed/mod.rs
Outdated
// * `length` is less than or equal to capacity, due to internal invaraints. | ||
// * `capacity` is correctly maintained internally. | ||
// * `BoxString` only ever contains valid UTF-8. | ||
unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, this.capacity()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A BoxedString
can store the capacity on either the stack or the heap, we store the capacity on the heap when there isn't enough room on the stack, see this doc comment for an explanation.
tl;dr I think you'll have to change this to something like:
match self.cap.as_usize() {
// capacity is on the stack, form a string from these parts
Ok(capacity) => {
let this = ManuallyDrop::new(self);
unsafe { String::from_raw_parts(this.ptr.as_ptr(), this.len, capacity) }
},
// capacity is on the heap! we need to form a new String
Err(_) => {
// ... drop the existing string and form a new one
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do something cute here to avoid reallocating by just copying down over the top of the boxed capacity, but it didn't do what I intended under test. My guess is that the layouts aren't compatible. I've included the code snippet below in case you know a bit more than me and want to iterate on it. For now, I've just set it to copy out a new string.
let capacity = this.capacity();
let cap_bytes = std::mem::size_of::<usize>();
// SAFETY: `this.ptr` should be a valid value if the capacity is on the heap,
// as should the address `size_of::<usize>()` bytes below that address.
let init = unsafe { this.ptr.as_ptr().sub(cap_bytes) };
// SAFETY: moving the bytes within the existing allocation
// * `init` is below `ptr`, and `len` bytes are already within the existing allocation.
unsafe {
std::ptr::copy(this.ptr.as_ptr(), init, this.len);
}
// SAFETY:
// * The memory in `init` was previously allocated by the same allocator the standard library
// uses, with a required alignment of exactly 1.
// * `length` is less than or equal to capacity, due to internal invaraints.
// * `capacity` is correctly maintained internally, and we recovered `cap_bytes` in the copy.
// * `BoxString` only ever contains valid UTF-8.
unsafe { String::from_raw_parts(init, this.len, capacity + cap_bytes) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to re-use the existing buffer. I think this is something we can iterate on, for now I filed issue #120 to track this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the layouts aren't compatible
Yeah, you'll run into an issue as String
's buffer is u8
aligned but the inline length is usize
aligned. At a performance cost, we could store the length unaligned to make the allocated alignment 1.
The unstable Allocator
trait allows requesting (though an implementation can theoretically refuse to fulfill) a change in alignment of an allocation. The stable GlobalAlloc
does not.
My vote is to just copy/realloc, and put a note tl if/when Allocator
is stabilized optimize it to opportunistically use a layout-adjusting realloc so the underlying allocator can avoid the copy.
compact_str/src/repr/mod.rs
Outdated
@@ -501,6 +537,12 @@ impl<'a> MutStrongRepr<'a> { | |||
} | |||
} | |||
|
|||
#[derive(Debug)] | |||
enum StrongIntoRepr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe StrongOwnedRepr
? makes it a little more general in case we use this for more than just into_string()
in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -695,3 +705,131 @@ fn test_to_compact_string() { | |||
format_compact!("0{}67890123456789{}", "12345", 999999) | |||
); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the existing test coverage, thank you!
Could you add one more test for the 16MB edge case that exists for 32-bit machines? see here for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a relevant test with a boxed capacity.
44c6b7d
to
4c2e026
Compare
By using `String::from_utf8_unchecked` we significantly speed up miri test runs. This is because miri doesn't need to spend a ton of time verifying that every character in the ~16MiB vector is UTF-8, when we already know that it is, since it's always just ASCII `A`s.
// SAFETY: We've just created the vector above, and it contains only ASCII characters. | ||
// This speeds up `miri` runs significantly. | ||
let string = unsafe { String::from_utf8_unchecked(word_buf) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Thanks for the PR and speeding up some of our tests!
Tests added and run under
miri
. Ensures that no reallocation/copy of already-boxed strings happens in a round-trip, with the address, length, and capacity remaining stable.Closes #117