[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #113
Currently, Wanakana considers
々
to be punctuation (due to where it lies in unicode ranges, alongside other pure punctuation symbols), though in practice it functions like a kanji character. This causes minor issues such astokenize()
splitting up words like人々
incorrectly andisKanji()
reporting false for人々
.This PR now considers
々
to be a kanji character, and excludes it from punctuation checks.Please see modified tests to see which functions are affected and #113 for prior discussion, as well as DJTB/react-furi#6.
This doesn't really affect the core usage of Wanakana (as an IME) but peripheral utils that we provide. Whenever 々 comes up currently it is behaving incorrectly, so despite considering this a
fix
I don't want consumers updating if they see a patch bump (5.1.1
). Since it changes existing behaviour perhaps it should be a major release?@mimshwright @scottnicolson @vietqhoang do you have any concerns merging this?
Do you prefer a major
6.0.0
or minor5.2.0
with breaking changes listed in release/changelog?