-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement and consolidate VarZeroVec mutation operations #1127
Conversation
This consolidates all bit-shifting mutation operations into a single function, since there's significant shared logic. Fixes unicode-org#1080.
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good! I mostly would like to see some more comments around the unsafe stuff before merging, but otherwise this is close to ready.
Note on our process: We try to push new commits each time so that it's easy to review, and everything gets squashed at the end when we merge, so just make commits as you make progress instead of amending or squashing.
@@ -90,6 +97,51 @@ impl<T: AsVarULE> VarZeroVecOwned<T> { | |||
self.get_components().get(idx) | |||
} | |||
|
|||
/// Get the position of a specific element in the data segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: document the safety invariant here in a ## Safety
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ditto on the other functions)
in particular, there are probably valid and invalid states for entire_slice to be in here, because e.g. this relies on entire_slice being the appropriate length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for most of these functions entire_slice
has to be entirely valid, except for index_data()
where only the indices array needs to be in the right position (but may have incorrect indices) and set_len()
where nothing needs to be valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, not sure how strictly worded the safety section needs to be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can include something about how entire_slice()
needs to be well-formed. Because most of the time we're operating on transitional states of entire_slice()
, these methods can't be called then.
For example, in shift_indices()
it's actually valid to call that on a buffer with a malformed index segment, but it must have the correct length set (and that's when we call it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much better I can do here without precisely explaining all invariants of the slice where the helper functions are used.
} else { | ||
let idx_offset = 4 + idx * 4; | ||
let idx_slice = &self.entire_slice[idx_offset..idx_offset + 4]; | ||
u32::from_unaligned(&PlainOldULE::<4>::from_byte_slice_unchecked(idx_slice)[0]) as usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use index_data()
in this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, though it requires making both this and element_range_unchecked &mut self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I made index_data
and index_data_mut
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, never mind then. Making two functions seems okay but the current code is also okay, mostly wanted to reduce duplication if possible
if shift_type == ShiftType::Remove { | ||
// Move the data before the element back by 4 to remove the index. | ||
debug_assert!(prev_element_p.start >= index_range.end); | ||
ptr::copy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: might be worth having a shift_bytes(Range<*const u8>, *mut u8)
helper given that all of the uses of ptr::copy
do the same kind of thing here (ptr::copy(x, y, z.offset_from(x))
==> shift_bytes(x..z, y)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
slice_range.end.offset_from(prev_element_p.end) as usize, | ||
); | ||
|
||
match shift_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: The actual variable this gets set to is further off, I wonder if this would help?
match shift_type { | |
/* let first_affected_index = */ match shift_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I instead moved the users of the variable into the block. I think it ends up cleaner.
// Invalidate and resize the data at an index, optionally inserting or removing the index. | ||
// Also updates affected indices and the length. | ||
// Returns a slice to the new element data - it doesn't contain uninitialized data but its value is indeterminate. | ||
unsafe fn shift(&mut self, index: usize, new_size: u32, shift_type: ShiftType) -> &mut [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: this is a really good refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! It was a fun puzzle :)
Adds a new test, clears the slice on last element removal, and adds an invariant testing function.
/// | ||
/// Note: an index is valid if it doesn't point to data past the end of the slice and is | ||
/// less than or equal to all future indices. The length of the index segment is not part of each index. | ||
fn verify_integrity(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debating if this is too overboard or if it should just depend on SliceComponents::parse_byte_slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small bit of improvements on the safety doc comments and we should be good to go!
@@ -90,6 +97,51 @@ impl<T: AsVarULE> VarZeroVecOwned<T> { | |||
self.get_components().get(idx) | |||
} | |||
|
|||
/// Get the position of a specific element in the data segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can include something about how entire_slice()
needs to be well-formed. Because most of the time we're operating on transitional states of entire_slice()
, these methods can't be called then.
For example, in shift_indices()
it's actually valid to call that on a buffer with a malformed index segment, but it must have the correct length set (and that's when we call it).
Ah, make sure to test with |
I feel like it was passing CI earlier, though. |
Ah, a simpler repro would be constructing a VarZeroVec by pushing in strings that are three or fewer characters in length: We call |
Pushed up a fix |
Ahh good catch, thanks for fixing |
Thanks for working on this! |
…#1127) * Implement and consolidate VarZeroVec mutation operations This consolidates all bit-shifting mutation operations into a single function, since there's significant shared logic. Fixes unicode-org#1080. Co-authored-by: Manish Goregaokar <manishsmail@gmail.com>
This consolidates all bit-shifting mutation operations into a single
function, since there's significant shared logic.
Fixes #1080.