Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

Merged
merged 6 commits into from
Sep 30, 2023

Conversation

DJTB
Copy link
Collaborator

@DJTB DJTB commented Sep 28, 2023

Closes #113

Currently, Wanakana considers to be punctuation (due to where it lies in unicode ranges, alongside other pure punctuation symbols), though in practice it functions like a kanji character. This causes minor issues such as tokenize() splitting up words like 人々 incorrectly and isKanji() reporting false for 人々.

This PR now considers to be a kanji character, and excludes it from punctuation checks.
Please see modified tests to see which functions are affected and #113 for prior discussion, as well as DJTB/react-furi#6.

This doesn't really affect the core usage of Wanakana (as an IME) but peripheral utils that we provide. Whenever 々 comes up currently it is behaving incorrectly, so despite considering this a fix I don't want consumers updating if they see a patch bump (5.1.1). Since it changes existing behaviour perhaps it should be a major release?

@mimshwright @scottnicolson @vietqhoang do you have any concerns merging this?
Do you prefer a major 6.0.0 or minor 5.2.0 with breaking changes listed in release/changelog?

@DJTB DJTB force-pushed the fix/kanji-iteration-character branch from 3456a61 to dda897b Compare September 28, 2023 08:21
@DJTB
Copy link
Collaborator Author

DJTB commented Sep 28, 2023

Also merged tokenize docs update since this modifies them as well.
Fixes #156 and closes #157

@vietqhoang
Copy link
Member

Did a first pass review. Nothing really stood out to me as far as the changes go. My first thoughts is I do agree the repeater should be considered more of a kanji than a punctuation.

As for the preference on how to increment the version, to me this is more of a minor version than a major version. The changes don't really add anything significant nor does it break the API.

@DJTB
Copy link
Collaborator Author

DJTB commented Sep 29, 2023

As for the preference on how to increment the version, to me this is more of a minor version than a major version. The changes don't really add anything significant nor does it break the API.

You're right that a major bump doesn't make much sense, I was being overly cautious since it changes some behaviour 😅(albeit for the better).

@DJTB DJTB merged commit 5a74bb5 into master Sep 30, 2023
@DJTB DJTB deleted the fix/kanji-iteration-character branch September 30, 2023 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants