Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore boundary_property when iterator reaches EOF and matches no rule #3404

Merged
merged 1 commit into from
May 9, 2023

Conversation

aethanyc
Copy link
Contributor

@aethanyc aethanyc commented May 3, 2023

Fixed #3392.

In the testcase one., when we reached e and ., we must look ahead one more character to determine if it matches WB6 [1]. However, . and EOF doesn't match any rule, and it makes e and . matching WB999 [2] (a break) instead. We should restore boundary_property in this scenario.

[1] https://www.unicode.org/reports/tr29/#WB6
[2] https://www.unicode.org/reports/tr29/#WB999

In the testcase `one.`, when we reached `e` and `.`, we must look ahead one more
character to determine if it matches WB6 [1]. However, `.` and EOF doesn't match
any rule, and it makes `e` and `.` matching WB999 (a break) instead. We should
restore `boundary_property` in this scenario.

[1] https://www.unicode.org/reports/tr29/#WB6
[2] https://www.unicode.org/reports/tr29/#WB999
@aethanyc aethanyc requested review from makotokato and sffc as code owners May 3, 2023 05:42
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Seems plausible but I'd like @makotokato to review.

Copy link
Member

@makotokato makotokato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good.

@aethanyc aethanyc merged commit 1286699 into unicode-org:main May 9, 2023
@aethanyc aethanyc deleted the restore-boundary branch May 9, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WordBreakIterator::word_type() returns wrong type
3 participants