Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MultiFieldsULE #3642

Merged
merged 3 commits into from
Jul 5, 2023
Merged

Conversation

skius
Copy link
Member

@skius skius commented Jul 5, 2023

MultiFieldsULE uses Index32, but this wasn't passed explicitly everywhere, so some places incorrectly defaulted to Index16.

The existing test suite did not discover this error, but I'm not sure why. Added a new struct with four internal VZVs, which triggers the bug. A struct with three internal VZVs also does not trigger the bug.

@skius skius requested a review from robertbastian July 5, 2023 16:19
@@ -44,7 +44,7 @@ impl MultiFieldsULE {
lengths, output,
);
debug_assert!(
<VarZeroSlice<[u8]>>::validate_byte_slice(output).is_ok(),
<VarZeroSlice<[u8], Index32>>::validate_byte_slice(output).is_ok(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good sleuthing!

However, I think we should prefer Index16. No reason to use the extra bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is intended to default to Index16. In the long run we can introduce a derive config flag for this.

Also it's a data breaking change to change the format here, though I don't think ICU4X uses this yet (it will once casemap stabilizes)

Copy link
Member

@Manishearth Manishearth Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, no, nvm, it should default to 32 since we expect the lengths to be large: this may be at the root of something sufficiently nested

for users who want 16 we should add a default type parameter and a zerovec attribute

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the issue of this PR, it is Index32 but not consistently. Switching to Index16 would be a breaking change as I understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(VarZeroVec defaults to Index16, but MFULE overrides that to Index32)

@Manishearth Manishearth merged commit 0ff9854 into unicode-org:main Jul 5, 2023
22 checks passed
@Manishearth Manishearth mentioned this pull request Sep 21, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants