Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

臼 again. #180

Closed
fontfish opened this issue Jan 31, 2019 · 11 comments
Closed

臼 again. #180

fontfish opened this issue Jan 31, 2019 · 11 comments
Labels
Duplicate A bug reported more than once Stroke order The order of strokes in the character

Comments

@fontfish
Copy link

Hello!

Github is not my forte and I don't understand how it works very well, so please forgive me if I'm making mistakes. I'm aware that I'm dredging up issues #94 and #121, but think that it would make more sense to reflect what seems to be the common stroke order for 臼. Essentially, swapping kanji/081fc.svg with kanji/081fc-HzFst.svg.

All the more recent and “official-seeming” documentation I can find on the topic lists the two central horizontal strokes as 3 and 5. Some examples:
https://kanji.jitenon.jp/kanjid/1978.html
漢語林 第二版
Kodansha Kanji Learner's Dictionary Revised and Expanded (post 2010 edition)

Some sources I can find for writing it with the central horizontal strokes as 4 and 5, not including those that use KanjiVG (perhaps a testament to its usefulness):
New Japanese-English Character Dictionary (over 20 years old now)
https://kakijun.jp/page/usu200.html (as an acceptable variant)
https://漢字筆順.com/c008/0365.html (as an acceptable variant)

Again, my aim is simply to bring attention to the fact that it might be best to keep things in line with what appears to be a kind of consensus on the stroke order rather than to simply dredge up old issues.

For the sake of consistency, a list of kanji containing 臼. I don't know how many of them are in KanjiVG.
https://kanji.jitenon.jp/kousei/list.php?data=81fc

As an extra note, should the “element” field in the svg reflect the components used in the writing of the kanji or should it show the radicals/tsukuri of the kanji? In short, is it worth me opening an issue about the elements of 勇 being マ and 男 rather than 甬 and 力 (its original components), or is it considered correct as-is? (I'm not entirely sure what I can make of this looking at modern Japanese sources either, to be honest, so am happy to leave it if you think that best.)

@wtn
Copy link

wtn commented Apr 20, 2021

All the more recent and “official-seeming” documentation I can find on the topic lists the two central horizontal strokes as 3 and 5.

Yes; 漢検 dictionary also agrees.

@fontfish
Copy link
Author

Thanks for the comment! Unfortunately, I'm not sure how to change this myself, or even whether I should, though I do still think the default form should match that recommended by dictionaries and educational organisations in Japan.

@fontfish
Copy link
Author

Having a go at making the changes in a fork!

@benkasminbullock benkasminbullock added the Stroke order The order of strokes in the character label Mar 25, 2022
@benkasminbullock
Copy link
Member

As an extra note, should the “element” field in the svg reflect the components used in the writing of the kanji or should it show the radicals/tsukuri of the kanji? In short, is it worth me opening an issue about the elements of 勇 being マ and 男 rather than 甬 and 力 (its original components), or is it considered correct as-is? (I'm not entirely sure what I can make of this looking at modern Japanese sources either, to be honest, so am happy to leave it if you think that best.)

This is specifically about Japan so the Japanese format should be used, and it's a graphical resource rather than an etymological resource, so there is no point adding the Chinese format, regardless of whether it is the original. If you want to check, a good place to go is the IDS repository. For this character, we have

U+52C7 勇 ⿱甬力[GTV] ⿱⿱龴田力[JK]

which means that GTV (Mainland China, Taiwan/Hong Kong, and Vietnam) use the previous format, and Japan and Korea use the latter format.

This is one of the problems caused by Han unification which was an effort to fit all characters into 16 bits by unifying Japanese and Chinese characters together depending on their "origin". The 16 bits goal has since been abandoned by Unicode.

@benkasminbullock
Copy link
Member

Sorry that got into a muddle. This should be done with #295.

@fontfish
Copy link
Author

Thank you for the edits and explanation, and my apologies for my very slow reply.

Regarding 勇, what you say about this being a graphical resource makes sense. The real issue there may be how dictionaries using this information choose to present it, which is up to them to consider. Japanese dictionaries that I have checked list only 力 under the 部首 field, then list 甬 and 力 under the etymology/character explanation.

Thanks again.

@benkasminbullock benkasminbullock added the Duplicate A bug reported more than once label May 4, 2022
@SlugFiller
Copy link

As I've just been bitten by this one, I want to take the opportunity to ask, should this also effect 諛 \u8ADB? It appears to be the same radical, suggesting strokes 10 and 11 should be swapped.

@benkasminbullock
Copy link
Member

How did you get bitten? I want you to report anything as an issue if there is a problem.

As for 諛, yes, it is wrong. That seems to have been caused by an incorrect value of 𦥑 for kvg:element on the group with ID number kvg:08adb-g4, hence it was missed by the script when I did the overall change. What I'll do with that one is to fix the element value & run the script again. If that doesn't work I'll just edit it with a text editor.

Let me know if you find any more like that.

@SlugFiller
Copy link

How did you get bitten? I want you to report anything as an issue if there is a problem.

As I've previously mentioned, I'm creating a visual indexing method for kanji that associates every stroke with an English character. I was indexing based on the "latest" release, where 臼's stroke order is indexed "qrosfc". After updating to pre-release, I noticed the strokes were at a mismatch, and had to reindex it to "qrfofc", as well as going back and re-indexing a few dozen kanjis containing the pattern. That's where I noticed 諛's index "ifsfkocqrosfcvj" still matches the stroke order diagram, even though it should logically need changing to "ifsfkocqrfofcvj", assuming the same pattern.

The correct pattern was already present in 嫂, 搜, and 鑿. And I was honestly wondering why the two patterns.

I'm fundamentally using KanjiVG as an "authoritative source" on stroke orders, so any errors are a hard hit.

Let me know if you find any more like that.

I've so far indexed ~5800 kanji of my target JIS X 0208's ~6300, so about 90% chance that's all of them. But I'll keep in mind to watch out for this pattern in the remaining kanji.

@benkasminbullock
Copy link
Member

I'm fundamentally using KanjiVG as an "authoritative source" on stroke orders, so any errors are a hard hit.

Unfortunately KanjiVG isn't an authoritative source, but hopefully with enough people reporting problems we can get it better. One thing I've tried to do is to remove some of the claims about KanjiVG giving the correct stroke order of kanji from the documentation and other pages. It's a best effort thing really. Since I started working on this repository in March, I've found some errors in Kanjidic, some errors in this repository, and so on and so forth.

I've so far indexed ~5800 kanji of my target JIS X 0208's ~6300, so about 90% chance that's all of them. But I'll keep in mind to watch out for this pattern in the remaining kanji.

Thank you. This issue should be fixed in the repo now:

b7c9365

I'm going to be cautious about making a new release, since I'm still not sure that the one I did before was OK. You can always just copy the repo data into your distribution XML file though. I've fixed up those Python scripts too, so you can probably make your own distribution-a-like files from the repo yourself.

@SlugFiller
Copy link

Unfortunately KanjiVG isn't an authoritative source, but hopefully with enough people reporting problems we can get it better.

I don't have much in terms of an alternative. Presumably, there's some document out there giving the officially approved stroke orders for each kanji. But not something I can easily reference, search, embed, and produce cross-sections of.

If there is something decently accessible, I can always compare it to my index and report any errors I find.

You can always just copy the repo data into your distribution XML file though. I've fixed up those Python scripts too, so you can probably make your own distribution-a-like files from the repo yourself.

I should probably rewrite my dependency on KanjiVG as a git submodule anyway. It's better for GitHub release, and would make updating to latest easier. It shouldn't be too difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate A bug reported more than once Stroke order The order of strokes in the character
Projects
None yet
Development

No branches or pull requests

4 participants