Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MultiFieldsULE #3642

Merged
merged 3 commits into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions utils/zerovec/derive/examples/make_var.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,20 @@ struct MultiFieldStruct<'a> {
f: char,
}

#[make_varule(MultiFieldConsecutiveStructULE)]
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, serde::Serialize, serde::Deserialize)]
#[zerovec::derive(Serialize, Deserialize, Debug)]
struct MultiFieldConsecutiveStruct<'a> {
#[serde(borrow)]
a: Cow<'a, str>,
#[serde(borrow)]
b: Cow<'a, str>,
#[serde(borrow)]
c: Cow<'a, str>,
#[serde(borrow)]
d: Cow<'a, str>,
}

#[make_varule(CustomVarFieldULE)]
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, serde::Serialize, serde::Deserialize)]
#[zerovec::derive(Serialize, Deserialize, Debug)]
Expand Down Expand Up @@ -139,6 +153,11 @@ fn main() {
assert_eq!(stack, &MultiFieldStruct::zero_from(zero))
});

assert_zerovec::<MultiFieldConsecutiveStructULE, MultiFieldConsecutiveStruct, _>(
TEST_MULTICONSECUTIVE,
|stack, zero| assert_eq!(stack, &MultiFieldConsecutiveStruct::zero_from(zero)),
);

let vartuples = &[
VarTupleStruct(101, 'ø', TEST_STRINGS1.into()),
VarTupleStruct(9499, '⸘', TEST_STRINGS2.into()),
Expand Down Expand Up @@ -197,3 +216,11 @@ const TEST_MULTIFIELD: &[MultiFieldStruct<'static>] = &[
f: 'ə',
},
];

const TEST_MULTICONSECUTIVE: &[MultiFieldConsecutiveStruct<'static>] =
&[MultiFieldConsecutiveStruct {
a: Cow::Borrowed("one"),
b: Cow::Borrowed("2"),
c: Cow::Borrowed("three"),
d: Cow::Borrowed("four"),
}];
8 changes: 5 additions & 3 deletions utils/zerovec/src/ule/multi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ impl MultiFieldsULE {
lengths, output,
);
debug_assert!(
<VarZeroSlice<[u8]>>::validate_byte_slice(output).is_ok(),
<VarZeroSlice<[u8], Index32>>::validate_byte_slice(output).is_ok(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good sleuthing!

However, I think we should prefer Index16. No reason to use the extra bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is intended to default to Index16. In the long run we can introduce a derive config flag for this.

Also it's a data breaking change to change the format here, though I don't think ICU4X uses this yet (it will once casemap stabilizes)

Copy link
Member

@Manishearth Manishearth Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, no, nvm, it should default to 32 since we expect the lengths to be large: this may be at the root of something sufficiently nested

for users who want 16 we should add a default type parameter and a zerovec attribute

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the issue of this PR, it is Index32 but not consistently. Switching to Index16 would be a breaking change as I understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(VarZeroVec defaults to Index16, but MFULE overrides that to Index32)

"Encoded slice must be valid VarZeroSlice"
);
// Safe since write_serializable_bytes produces a valid VarZeroSlice buffer
Expand Down Expand Up @@ -141,12 +141,14 @@ unsafe impl VarULE for MultiFieldsULE {
/// This impl exists so that EncodeAsVarULE can work.
#[inline]
fn validate_byte_slice(slice: &[u8]) -> Result<(), ZeroVecError> {
<VarZeroSlice<[u8]>>::validate_byte_slice(slice)
<VarZeroSlice<[u8], Index32>>::validate_byte_slice(slice)
}

#[inline]
unsafe fn from_byte_slice_unchecked(bytes: &[u8]) -> &Self {
// &Self is transparent over &VZS<..>
mem::transmute(<VarZeroSlice<[u8]>>::from_byte_slice_unchecked(bytes))
mem::transmute(<VarZeroSlice<[u8], Index32>>::from_byte_slice_unchecked(
bytes,
))
}
}