Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add Simplified Chinese translation #89

Merged
merged 6 commits into from
Feb 27, 2023

Conversation

cxw620
Copy link
Contributor

@cxw620 cxw620 commented Feb 26, 2023

Well, iced-rs currently has no font fallback, when font is set to be Font::Default, Chinese can still not be displayed:
image

INCONSOLATA_BOLD which sniffnet takes with, doesn't support Chinese as well.

We can only choose fonts that support Chinese and English, etc, as much as possible. So a new font LXGWWenKaiMonoLite was introduced by me and it works really well on style Yeti Day and Mon Amour. However, since font itself may be part of a specific style, I don't know if it's accepted.

image
image

Related codes:

pub fn get_font(style: StyleType) -> Font {
    match to_rgb_color(get_colors(style).text_body) {
        RGBColor(255, 255, 255) => Font::Default,
        // I changed INCONSOLATA_BOLD to LXGW_MONO_LITE_BOLD
        _ => LXGW_MONO_LITE_BOLD,
    }
}

Details of LXGWWenKaiMonoLite can be found here: https://github.com/lxgw/LxgwWenKai-Lite

@GyulyVGC
Copy link
Owner

Thank you very much!
Iced seems really close to support text shaping and font fallback (see here for more details).
However, until it will not be released, I think that we could keep using Inconsolata for the other languages and the font you proposed (which is actually very cool) for Chinese. This would require adding a Language parameter to the get_font methods.
The radios length problem can be fixed just by setting the font as default (it is setted in main.rs and currently is set to Inconsolata), but of course if we'll use two different fonts it could be a problem.
The easiest solution would actually be to use the font you proposed as default and remove Inconsolata, I think in this way all problem would actually be solved.
I didn't check yet the font specifications but it must be a monospaced font, otherwise everything in the bottom table would be disaligned.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 26, 2023

LXGWWenKaiMonoLite means LXGWWenKai's monospaced, lite version. It support many languages. I think it's OK to set LXGWWenKaiMonoLite as default.

Here're some examples for you to preview:
English:
image
French:
image
Spanish:
image
German:
image
Italian:
image
Polish:
image
Simplified Chinese:
image

@GyulyVGC
Copy link
Owner

It seems really good.
I'll try it and will let you know!

If you have other monospaced fonts (available both as lite and bold) to suggest, feel free to do it in the meantime.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 26, 2023

In the long run, I think it is recommended to add support for font settings, allowing users to set their favourite font. I'm not familiar with iced so I'm afraid there's little I can help with.

Monospaced fonts which include Chinese characters are really rare. LXGWWenKai Mono series also support traditional Chinese and Japanese. I do think it's the best choice.

@GyulyVGC
Copy link
Owner

Another important concern is that the fonts including all the Chinese glyphs are pretty heavy in size, but I guess this cannot be avoided since they must include thousands of characters.
This would significantly make Sniffnet executable bigger.
I don't know if there are lighter font files but I doubt.
I think that waiting for iced to support font fallback is the best thing to do.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 26, 2023

What concerns is that we don't know when will iced finish the development of text shaping and font fallback, as it's really a breaking change for iced. I'm going to look into how to minify font files.

@GyulyVGC
Copy link
Owner

An easy first step to take is to remove all the traditional Chinese characters. I think that simplified Chinese is by far more common, if you can confirm.

@GyulyVGC
Copy link
Owner

Can you please give a look at this table?
Are you able to tell me which groups are needed by simplified Chinese?

@GyulyVGC
Copy link
Owner

Nevermind, I think the best solution is using something like this to keep only the glyphs actually used.

So the plan is: load the full ttf files, and every time a new language/new phrases are added filter only the used characters and duplicate the font keeping only the needed glyphs.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 27, 2023

Can you please give a look at this table? Are you able to tell me which groups are needed by simplified Chinese?

TL;DR

CJK Unified Ideographs: U+4E00 - U+9FCB

I used English punctuation marks in Chinese translation. So we currently don't need Chinese punctuation marks.

Ref: https://www.cnblogs.com/hookjc/p/13178791.html

Some related information:

Due to the influence of Chinese culture, East Asian countries contain some Chinese characters (such as Japanese and Korean) in their languages. The Unicode working group decided to merge the Chinese characters in the Chinese, Japanese and Korean languages into the unified ideographic characters of China, Japan and Korea (CJK, The initials of the three languages of China, Japan and Korea).

If someone provides a Japanese translation, please refer to https://stackoverflow.com/questions/19899554/unicode-range-for-japanese

All CJK:

An easy first step to take is to remove all the traditional Chinese characters. I think that simplified Chinese is by far more common, if you can confirm.

I'm afraid that it's a little hard to split Simplified Chinese characters from all Chinese characters. Let's try to keep all the characters in CJK Unified Ideagraphs and see how the space is occupied.
If it is really necessary, I will sort out the Unicode table of most commonly used characters in Simplified Chinese.

Nevermind, I think the best solution is using something like this to keep only the glyphs actually used.

So the plan is: load the full ttf files, and every time a new language/new phrases are added filter only the used characters and duplicate the font keeping only the needed glyphs.

Sounds good!

Any more problems on Simplified Chinese translation please feel free to ask me. I may reply a bit late for my busy study works.

@GyulyVGC
Copy link
Owner

Thank you very much for the detailed answer.
I'll try to filter out the unused glyphs and will let you know!

@GyulyVGC GyulyVGC changed the base branch from main to chinese_font_setup February 27, 2023 23:04
@GyulyVGC GyulyVGC merged commit e53420c into GyulyVGC:chinese_font_setup Feb 27, 2023
@GyulyVGC
Copy link
Owner

A brief recap:

  • after having filtered only the used characters/glyphs, the dimension of the .ttf files drastically decreased from 10MB to 100KB and is now acceptable
  • I merged the PR into a secondary branch since I want to do some more tests, but it will be merged in main soon hopefully; in particular there is just one more problem:
    • when selecting a light application theme, the text is black and it is not very well readable, as it isn't bold enough and this annoying effect happens. I also tried the non-lite version of the LXGW font but nothing changes...

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 28, 2023

I tried the branch chinese_font_setup version and noticed the problem. I've created an issue and hope the contributors of LXGW series will take my advice adding an ExBold version of LXGWWenKaiMono. Of course it would be great if there could be any more ways to deal with the effect by ourselves.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 28, 2023

Unfortunately, the maintainer of LXGW directly rejected our request on the grounds that his technical ability is insufficient, etc. lxgw/LxgwWenKai-Lite#5 (comment).

Well, shall we try combining two different fonts, while one of them includes Chinese characters, into the same font? Or we may have to wait until iced finishes the development on font fallback.
I currently know little about iced and I cannot help on more technical details, but I'm ready to help you anytime you need.

@GyulyVGC
Copy link
Owner

Thanks for the kind words.
I don't feel like combining two fonts in one is a very good/scalable solution.

I also asked the maintainer of iced and he said me that the current renderer has this limitation on light backgrounds and currently there is not a proper solution.

The best solution for now would be to find another complete font as LXGW but bolder, with all the characters we need for the different languages.
If you are not aware of a bolder, monospaced and complete font... I fear we have to wait until font fallback is supported.

@cxw620
Copy link
Contributor Author

cxw620 commented Feb 28, 2023

OK. I will try searching for other fonts. And I created an issue telling anyone who needs Chinese translation can temporally using my pre compiled executable file.

@GyulyVGC
Copy link
Owner

OK. I will try searching for other fonts. And I created an issue telling anyone who needs Chinese translation can temporally using my pre compiled executable file.

Awesome, thanks

@NightFurySL2001
Copy link

I would suggest using Sarasa Gothic (specifically Sarasa Mono) for CJK coding usage. Wenkai is a modulated handwritten typeface and does not fit well with slab serif monospaced fonts.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 1, 2023

I would suggest using Sarasa Gothic (specifically Sarasa Mono) for CJK coding usage. Wenkai is a modulated handwritten typeface and does not fit well with slab serif monospaced fonts.

Nice, thanks for the suggestion!
I downloaded sarasa-gothic-ttf-0.40.1 and the unzipped folder contains so many different files.
Which of them do you suggest? And also, do you suggest the version I downloaded or the unhinted one?

@NightFurySL2001
Copy link

NightFurySL2001 commented Mar 1, 2023

Hinted version is probably for Windows ClearType. For version used, quote from original:

Em dashes (——) are full width —— Mono
Em dashes (——) are half width —— Term
No ligature, Em dashes (——) are half width —— Fixed

Any of these three should work, for Chinese em dashes should be full width, but depending on use case half width are acceptable (including coding).

Use SC version for Simplified Chinese and CL for Traditional Chinese. TC/HC are weird and not region-agnostic, so I would suggest against using those.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 1, 2023

Considering only sarasa-mono there are:

  • sarasa-mono-cl
  • sarasa-mono-hc
  • sarasa-mono-j
  • sarasa-mono-k
  • sarasa-mono-sc
  • sarasa-mono-tc
  • sarasa-mono-slab-cl
  • sarasa-mono-slab-hc
  • sarasa-mono-slab-j
  • sarasa-mono-slab-k
  • sarasa-mono-slab-sc
  • sarasa-mono-slab-tc

@NightFurySL2001
Copy link

The slab version probably doesn't match Inconsolate, so I'd suggest sarasa-mono-sc for Simplified Chinese and sarasa-mono-cl for Traditional Chinese.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 1, 2023

Note that I would like to use just one font type. So ideally, I would have to use sarasa-mono for all the languages.

@NightFurySL2001
Copy link

It is impossible to do so for CJK (Simplified Chinese/Traditional Chinese/Japanese/Korean) as there is the problem of Han unification: each CJK locale must have their own font to display the correct Han character.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 1, 2023

I will try sarasa-mono-slab-sc (since after having a look at both I prefer the slab version) and I will let you know.
I think it contains all the characters needed by all the current languages of Sniffnet, so it shouldn't be a problem for now.
Thanks for your help!

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 1, 2023

I ended up choosing the non slab version. The bold version is well visible on light backgrounds.
I think I will merge it into main in the next few days.

@NightFurySL2001
Copy link

Just to remind, it you plan to use Sarasa fonts with Latin usage, the Term version will be more suitable compared to Mono as the em dash is also used in Latin too.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 2, 2023

Is the em dash the only relevant difference?
If it is so, I think there is no problem to use it also for Latin characters.

@GyulyVGC
Copy link
Owner

GyulyVGC commented Mar 3, 2023

@all-contributors please add @cxw620 for translation.

@allcontributors
Copy link
Contributor

@GyulyVGC

I've put up a pull request to add @cxw620! 🎉

@GyulyVGC GyulyVGC added the translation User interface translation label Mar 27, 2023
GyulyVGC added a commit that referenced this pull request May 7, 2023
chore: add Simplified Chinese translation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation User interface translation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants