Lower right of 舞 #116

benkasminbullock · 2021-05-13T00:12:14Z

The lower right corner of 舞 seems to diverge between Chinese and Japanese.

Japanese seems to write 舞 with the lower right as four strokes:

But the 㐄 element seems to be three strokes in the same Japanese sources:

In ids.txt they are unified onto one thing, but there seems to be an actual difference.

hfhchan · 2021-05-13T00:17:59Z

It is the same component but customarily written as different forms due to an inconsistency in Japanese kanji standardization. There is no semantic difference and the distinction is unifiable for the purposes of ISO10646 standardization.

hfhchan · 2021-05-13T00:21:50Z

If you want to decompose glyphs exactly as they look in various standards, you may want to check out yi-bai/ids which decomposes characters down to the stroke level and has data indicating stroke joining behaviour.

benkasminbullock · 2021-05-13T00:24:09Z

OK but the requirement and details for that specification isn't documented here. If it's required to fit that spec, at the least document it.

hfhchan · 2021-05-13T00:28:55Z

Unfortunately the maintainer has not been able to update the repository :/

This repository is the main data source used for IRG IDS algorithm, though the decomposition data is also useful for other purposes. That's why the IDSs used are more vague.

benkasminbullock · 2021-05-13T00:38:40Z

If you want to decompose glyphs exactly as they look in various standards, you may want to check out yi-bai/ids which decomposes characters down to the stroke level and has data indicating stroke joining behaviour.

This? https://github.com/yi-bai/ids

It contains some information on this particular character:

舞 ⿳𠂉卌.⿱一舛.

舛 ⿰夕㐄.(.);⿰夕㐄J(K)

I'm not sure what that (.) all means yet, and the above doesn't accord with my own findings, but thank you for the pointer.

benkasminbullock · 2021-05-13T00:41:26Z

Unfortunately the maintainer has not been able to update the repository :/

I don't have information except that @kawabata contributed to a project in January 2021 so I assume he is in good health.

This repository is the main data source used for IRG IDS algorithm, though the decomposition data is also useful for other purposes. That's why the IDSs used are more vague.

Is that kawabata's purpose of making the repository? It seems undocumented, queries to the mailing list went unanswered, and so on. If this repository is intended for your purpose then at least it should say so. I will leave this bug report open for the time being pending guidance "from above".

hfhchan · 2021-05-13T02:23:51Z

I believe this repository was born before it was used for IRG standardization, however I am not sure because I joined IRG much later than Kawabata-san.

If you refer to the the IRG working documents (https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg56/IRG56.htm), you can see that this repo is used listed as the official IDS equivalence database for conducting CJK Unification.

The decomposition strategies used for IDS data to be used for CJK standardization purposes are specified in IRGN1183 in IRG#25, written by @kawabata himself: https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg25/IRG25.htm.

Refer to paragraph 2.5 of the decomposition strategy which would be relevant to this case:

2.5. Generousness on minor differences

Don't try to represent details of the shapes of an ideograph. Ignore minor differences. We have a set of unification rules and if the difference is important (for the unification rules), we can consider so through the eye-to-eye review after the IDS based matching. On the other hand, if the IDS is constructed under a draconian policy, two shapes to be unified may have a totally different IDS and we may fail to find them duplicate.

Though recently IDS check maintenance for IRG's standardization purposes has been passed to @yi-bai because @kawabata is busy. He maintains a proprietary format for IDSes. You may want to consult with him to see if he wants to increase his coverage for other locales.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower right of 舞 #116

Lower right of 舞 #116

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021 •

edited

Loading

hfhchan commented May 13, 2021 •

edited

Loading

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021 •

edited

Loading

benkasminbullock commented May 13, 2021

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021

Lower right of 舞 #116

Lower right of 舞 #116

Comments

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021 • edited Loading

hfhchan commented May 13, 2021 • edited Loading

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021 • edited Loading

benkasminbullock commented May 13, 2021

benkasminbullock commented May 13, 2021

hfhchan commented May 13, 2021

hfhchan commented May 13, 2021 •

edited

Loading

hfhchan commented May 13, 2021 •

edited

Loading

hfhchan commented May 13, 2021 •

edited

Loading