Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create test that property open enums stay in sync with ICU #4283

Closed
sffc opened this issue Nov 14, 2023 · 5 comments · Fixed by #4645
Closed

Create test that property open enums stay in sync with ICU #4283

sffc opened this issue Nov 14, 2023 · 5 comments · Fixed by #4645
Assignees
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers S-small Size: One afternoon (small bug fix or enhancement) T-docs-tests Type: Code change outside core library

Comments

@sffc
Copy link
Member

sffc commented Nov 14, 2023

Split from #4132

We have property open enums in components/properties/src/props.rs. We should make sure these stay up-to-date with ICU for better ergonomics for ICU4X users. Since we use open enums, nothing will break, but users won't be able to use these properties using the nice names in the API.

To test this, I would suggest walking a lookup table such as propnames/from/lb@1 (see provider/datagen/tests/data/json/propnames/from/lb@1/und.json) and ensuring that the largest known open enum value is at least as much as the largest value in the propnames table.

@sffc sffc added T-docs-tests Type: Code change outside core library good first issue Good for newcomers C-unicode Component: Props, sets, tries S-small Size: One afternoon (small bug fix or enhancement) labels Nov 14, 2023
@kartva
Copy link
Member

kartva commented Feb 26, 2024

I've observed the following:

  • Types like PropertyValueNameToEnumMapV1 (or their higher level types) contain loaded data from the lookup tables.
  • They provide a ZeroMap that contains the open enum values, which can (presumably) be walked.

We want to find the highest value in the consts defined on the public interface (eg. BidiClass::LeftToRight), and compare that with the highest value in the open enum values in the loaded data.

We can define a macro to go through the defined constants and find the highest value in them.

Am I on the right track?

@ashu26jha
Copy link
Contributor

ashu26jha commented Feb 26, 2024

Hello @sffc, if I'm getting this right:

The test needs to throw an error, in case und.json has a field for which we don't have an enum and also make sure that the largest known open enum value is at least as much as the largest value in the propnames table?

Furthermore for:

  1. WordBreak we could look in provider/datagen/tests/data/json/propnames/from/wb@1/und.json
  2. Script we could look in provider/datagen/tests/data/json/propnames/from/sc@1/und.json

Also should I create a new test file? props.rs has already over 2500 lines of code.

Thanks,

PS:
I just saw @DesmondWillowbrook's comment after writing my own, this is actually a good insight, however, it's worth noting that ensuring the highest values match does not guarantee that all intermediate values are correctly mapped or that there are no gaps in the enum definitions or they are in sync.

@kartva
Copy link
Member

kartva commented Feb 26, 2024

@ashu26jha
Thanks for the note. I've started working on this issue already. To prevent duplication of work, would you like to take up another issue to contribute to? Alternatively, if you feel that you're particularly well-suited to tackle this issue, you can definitely take over. Let me know how you feel.

@ashu26jha
Copy link
Contributor

ashu26jha commented Feb 26, 2024

Surely! I won't mind if you take on this issue. Since I do have put a little bit of time I would happily review your PR if you don't mind :P

Cheers!

@sffc
Copy link
Member Author

sffc commented Feb 26, 2024

Hi @DesmondWillowbrook @ashu26jha!

Loading the data via PropertyValueNameToEnumMapV1 is likely to be the cleanest approach. Then you can walk the ZeroMap.

If you need to use a macro, try to keep as little code as possible inside the macro and use helper functions outside of the macro.

Most properties are contiguous with no gaps. Canonical_Combining_Class is an exception. Note that properties like Script are contiguous but not in alphabetical order.

Suggestion for where to put the test: components/properties/tests/icu_coverage.rs

@sffc sffc added this to the Priority Backlog ⟨P3⟩ milestone Feb 29, 2024
sffc pushed a commit that referenced this issue Mar 10, 2024
fixes #4283

Description of changes:
- Introduced `create_const_array` macro that produces an array of tuples
of the values and names of the constants defined in an impl block.
- Use this constant to iterate over underlying data and report
differences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers S-small Size: One afternoon (small bug fix or enhancement) T-docs-tests Type: Code change outside core library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants