Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement and consolidate VarZeroVec mutation operations #1127

Merged
merged 7 commits into from
Oct 2, 2021

Conversation

kupiakos
Copy link
Contributor

This consolidates all bit-shifting mutation operations into a single
function, since there's significant shared logic.

Fixes #1080.

This consolidates all bit-shifting mutation operations into a single
function, since there's significant shared logic.

Fixes unicode-org#1080.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • utils/zerovec/src/varzerovec/owned.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Manishearth
Manishearth previously approved these changes Sep 30, 2021
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really good! I mostly would like to see some more comments around the unsafe stuff before merging, but otherwise this is close to ready.

Note on our process: We try to push new commits each time so that it's easy to review, and everything gets squashed at the end when we merge, so just make commits as you make progress instead of amending or squashing.

@@ -90,6 +97,51 @@ impl<T: AsVarULE> VarZeroVecOwned<T> {
self.get_components().get(idx)
}

/// Get the position of a specific element in the data segment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document the safety invariant here in a ## Safety section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto on the other functions)

in particular, there are probably valid and invalid states for entire_slice to be in here, because e.g. this relies on entire_slice being the appropriate length

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for most of these functions entire_slice has to be entirely valid, except for index_data() where only the indices array needs to be in the right position (but may have incorrect indices) and set_len() where nothing needs to be valid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, not sure how strictly worded the safety section needs to be here

Copy link
Member

@Manishearth Manishearth Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can include something about how entire_slice() needs to be well-formed. Because most of the time we're operating on transitional states of entire_slice(), these methods can't be called then.

For example, in shift_indices() it's actually valid to call that on a buffer with a malformed index segment, but it must have the correct length set (and that's when we call it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much better I can do here without precisely explaining all invariants of the slice where the helper functions are used.

utils/zerovec/src/varzerovec/owned.rs Show resolved Hide resolved
} else {
let idx_offset = 4 + idx * 4;
let idx_slice = &self.entire_slice[idx_offset..idx_offset + 4];
u32::from_unaligned(&PlainOldULE::<4>::from_byte_slice_unchecked(idx_slice)[0]) as usize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use index_data() in this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, though it requires making both this and element_range_unchecked &mut self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I made index_data and index_data_mut.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind then. Making two functions seems okay but the current code is also okay, mostly wanted to reduce duplication if possible

utils/zerovec/src/varzerovec/owned.rs Outdated Show resolved Hide resolved
utils/zerovec/src/varzerovec/owned.rs Show resolved Hide resolved
if shift_type == ShiftType::Remove {
// Move the data before the element back by 4 to remove the index.
debug_assert!(prev_element_p.start >= index_range.end);
ptr::copy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: might be worth having a shift_bytes(Range<*const u8>, *mut u8) helper given that all of the uses of ptr::copy do the same kind of thing here (ptr::copy(x, y, z.offset_from(x)) ==> shift_bytes(x..z, y))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

slice_range.end.offset_from(prev_element_p.end) as usize,
);

match shift_type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: The actual variable this gets set to is further off, I wonder if this would help?

Suggested change
match shift_type {
/* let first_affected_index = */ match shift_type {

Copy link
Contributor Author

@kupiakos kupiakos Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I instead moved the users of the variable into the block. I think it ends up cleaner.

utils/zerovec/src/varzerovec/owned.rs Show resolved Hide resolved
utils/zerovec/src/varzerovec/owned.rs Show resolved Hide resolved
// Invalidate and resize the data at an index, optionally inserting or removing the index.
// Also updates affected indices and the length.
// Returns a slice to the new element data - it doesn't contain uninitialized data but its value is indeterminate.
unsafe fn shift(&mut self, index: usize, new_size: u32, shift_type: ShiftType) -> &mut [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: this is a really good refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It was a fun puzzle :)

Adds a new test, clears the slice on last element removal, and adds an
invariant testing function.
///
/// Note: an index is valid if it doesn't point to data past the end of the slice and is
/// less than or equal to all future indices. The length of the index segment is not part of each index.
fn verify_integrity(&self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debating if this is too overboard or if it should just depend on SliceComponents::parse_byte_slice

Manishearth
Manishearth previously approved these changes Oct 1, 2021
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small bit of improvements on the safety doc comments and we should be good to go!

@@ -90,6 +97,51 @@ impl<T: AsVarULE> VarZeroVecOwned<T> {
self.get_components().get(idx)
}

/// Get the position of a specific element in the data segment.
Copy link
Member

@Manishearth Manishearth Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can include something about how entire_slice() needs to be well-formed. Because most of the time we're operating on transitional states of entire_slice(), these methods can't be called then.

For example, in shift_indices() it's actually valid to call that on a buffer with a malformed index segment, but it must have the correct length set (and that's when we call it).

utils/zerovec/src/varzerovec/owned.rs Show resolved Hide resolved
Manishearth
Manishearth previously approved these changes Oct 2, 2021
@Manishearth
Copy link
Member

Ah, make sure to test with cargo test --all-features, it's failing a map test.

@Manishearth
Copy link
Member

I feel like it was passing CI earlier, though.

@Manishearth
Copy link
Member

Ah, a simpler repro would be constructing a VarZeroVec by pushing in strings that are three or fewer characters in length: We call self.index_data_mut() while it's invalid to call since it expects the entire_slice array to actually be the right size, which it isn't yet.

Manishearth
Manishearth previously approved these changes Oct 2, 2021
@Manishearth
Copy link
Member

Pushed up a fix

@kupiakos
Copy link
Contributor Author

kupiakos commented Oct 2, 2021

We call self.index_data_mut() while it's invalid to call since it expects the entire_slice array to actually be the right size, which it isn't yet.

Ahh good catch, thanks for fixing

@Manishearth Manishearth merged commit e6245c2 into unicode-org:main Oct 2, 2021
@Manishearth
Copy link
Member

Thanks for working on this!

@Manishearth Manishearth mentioned this pull request Oct 3, 2021
robertbastian pushed a commit to robertbastian/icu4x that referenced this pull request Oct 4, 2021
…#1127)

* Implement and consolidate VarZeroVec mutation operations

This consolidates all bit-shifting mutation operations into a single
function, since there's significant shared logic.

Fixes unicode-org#1080.

Co-authored-by: Manish Goregaokar <manishsmail@gmail.com>
@sffc sffc removed their request for review October 18, 2021 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve mutation operations on VarZeroVecOwned
2 participants