[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

DJTB · 2023-09-28T02:46:39Z

Closes #113

Currently, Wanakana considers 々 to be punctuation (due to where it lies in unicode ranges, alongside other pure punctuation symbols), though in practice it functions like a kanji character. This causes minor issues such as tokenize() splitting up words like 人々 incorrectly and isKanji() reporting false for 人々.

This PR now considers 々 to be a kanji character, and excludes it from punctuation checks.
Please see modified tests to see which functions are affected and #113 for prior discussion, as well as DJTB/react-furi#6.

This doesn't really affect the core usage of Wanakana (as an IME) but peripheral utils that we provide. Whenever 々 comes up currently it is behaving incorrectly, so despite considering this a fix I don't want consumers updating if they see a patch bump (5.1.1). Since it changes existing behaviour perhaps it should be a major release?

@mimshwright @scottnicolson @vietqhoang do you have any concerns merging this?
Do you prefer a major 6.0.0 or minor 5.2.0 with breaking changes listed in release/changelog?

… style

DJTB · 2023-09-28T08:23:01Z

Also merged tokenize docs update since this modifies them as well.
Fixes #156 and closes #157

vietqhoang · 2023-09-28T19:59:30Z

Did a first pass review. Nothing really stood out to me as far as the changes go. My first thoughts is I do agree the repeater should be considered more of a kanji than a punctuation.

As for the preference on how to increment the version, to me this is more of a minor version than a major version. The changes don't really add anything significant nor does it break the API.

DJTB · 2023-09-29T00:44:19Z

As for the preference on how to increment the version, to me this is more of a minor version than a major version. The changes don't really add anything significant nor does it break the API.

You're right that a major bump doesn't make much sense, I was being overly cautious since it changes some behaviour 😅(albeit for the better).

DJTB added 4 commits June 21, 2023 15:12

docs: fix incorrect tokenize example

7e9d999

build: update prettier config to match eslint / existing single quote…

6d55217

… style

fix: exclude iteration kanji symbol from JA punctuation

ca7e70a

test: add iteration test to stripOkurigana

e6e203b

DJTB requested review from mimshwright, scottnicolson and vietqhoang September 28, 2023 02:46

DJTB mentioned this pull request Sep 28, 2023

build: fix odoriji in combineFuri() DJTB/react-furi#6

Merged

13 tasks

Merge branch 'fix/tokenize-docs' into fix/kanji-iteration-character

dda897b

DJTB force-pushed the fix/kanji-iteration-character branch from 3456a61 to dda897b Compare September 28, 2023 08:21

vietqhoang approved these changes Sep 29, 2023

View reviewed changes

docs: update changelog

ca16c78

DJTB merged commit 5a74bb5 into master Sep 30, 2023

DJTB deleted the fix/kanji-iteration-character branch September 30, 2023 05:37

DJTB mentioned this pull request Sep 30, 2023

Refactor: use wanakana odoriji patch DJTB/react-furi#7

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

DJTB commented Sep 28, 2023 •

edited

Loading

DJTB commented Sep 28, 2023

vietqhoang commented Sep 28, 2023

DJTB commented Sep 29, 2023

[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

[Breaking] Feature: include '々' iteration character in kanji, exclude from JA punctuation #163

Conversation

DJTB commented Sep 28, 2023 • edited Loading

DJTB commented Sep 28, 2023

vietqhoang commented Sep 28, 2023

DJTB commented Sep 29, 2023

DJTB commented Sep 28, 2023 •

edited

Loading