-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grapheme text segmentation and test suite #3
Grapheme text segmentation and test suite #3
Conversation
@lukewilliamboswell Just checking - should I hold off on review until the tests are passing? (I saw in the description you mentioned the TODOs, but I wanted to check!) |
Thank you for clarifying. I think those changes will be more suited for another PR. I suspect it is going to be a challenge, at least I need to learn a lot more about emoji before then, and we may need to change the approach/algorithm to do it. If you have feedback on these changes that would be most appreciated, thank you. |
Update on this PR; I've re-written the script for generating the test suite, currently called I've also started on a new implementation of the algorithm for text segmentation currently called |
|
||
# allocated extra space for the extra bytes as some CPs expand into | ||
# multiple U8s, so this minimises extra allocations | ||
capacity = List.withCapacity (50 + List.len cps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change, but might be worth scaling the extra bytes based on length of list? Like maybe (List.len cps // 10) + List.len cps
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic! 🤩 🤩 🤩 🤩 🤩
Really great stuff! I'm so happy to see this working! 💯
This PR;
roc check
,roc test
,roc build
, androc docs
NOTE implementation of Extended Grapheme Cluster requires the implementation of rules
GB9a
,GB9b
,GB9c
which are left for a future PR.Run Generation Scripts
To re-generate the generated files you can use
bash rebuild.sh
Tests
To run the tests for Grapheme test suite use
roc test package/GraphemeTest.roc
Examples
I tried to include an additional example that used
Grapheme.split
but there are significant compiler bugs that prevented me from including with this PR.Here is an demo from the tests showing the function in use.