Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter misses opportunity to detect identical glyphs, stores them as separate images #120

Open
pavmick opened this issue Aug 24, 2024 · 12 comments

Comments

@pavmick
Copy link

pavmick commented Aug 24, 2024

As the title says. I am converting ASCII and Cyrillic ranges. The letter A, for example, is present in both ranges and it is being stored twice. Interestingly, the stored images are slightly different. Same for other identical glyphs. It should not be too troublesome to detect duplicate glyphs and store one copy only.

@kisvegabor
Copy link
Member

How can we know if the ASCII A is the same as Cyrillc A? Check it on the rasterized image?

@pavmick
Copy link
Author

pavmick commented Aug 30, 2024

I believe font files have facilities that allow different Unicode code points to reference the same glyph. For example, you can go to https://fontdrop.info/ , load arial.ttf, scroll down to unicode 0410 (Cyrillic letter A) click on it and observe "This composite glyph is a combination of: glyph 36". If you click on the letter A from ASCII range (close to top of table), you'll see same index 36.

@kisvegabor
Copy link
Member

How many glyphs can be affected by that? I estimate it to max. 1% (but probably closer to 0.1%). What do you think?

@pavmick
Copy link
Author

pavmick commented Sep 2, 2024

Let's see. For the Russian alphabet, I would say 11 uppercase and 8 lowercase letters share glyphs with ASCII. That would be 15% of ASCII range.

@kisvegabor
Copy link
Member

Okay, it's really significant is this case.

So the task is to make the duplicated glyphs point to the same bitmap, right? If so, I'm okay with this feature. However I'm not a JS developer and can't work on the implementation.

Do you have time and interest to implement it?

cc @puzrin

@puzrin
Copy link
Collaborator

puzrin commented Sep 4, 2024

Guys, before discussing any changes, it's worth providing proof that the source font has multiple character codes mapped to the same image. If source images are different, that's the intent of the font authors, not a converter issue.

The TTF format has different tables for "images" and "char codes." AFAIK if an image has multiple references from char codes, the convertor should preserve them (but I'm not sure and don't remember details).

@puzrin
Copy link
Collaborator

puzrin commented Sep 4, 2024

Also worth refer binary format as base. The "lvgl" one is less optimal, focused on text representation of the source. Binary is a close subset of TTF, with minor local changes about raster/compression instead of vectors.

@pavmick
Copy link
Author

pavmick commented Sep 4, 2024

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs. This set could vary slightly from font to font, but I don't expect major variations. I am mostly an embedded C developer with some knowledge of JS. But I'll see if I can dive into the code and suggest patches.

@puzrin
Copy link
Collaborator

puzrin commented Sep 4, 2024

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs.

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

@pavmick
Copy link
Author

pavmick commented Sep 4, 2024

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

Just ran the converter on arial.ttf. Yes, the glyphs in question are duplicated. This time exact copies, to the last bit. I am not using the binary font format in my applications, so I can't confirm this behavior with it.

@puzrin
Copy link
Collaborator

puzrin commented Sep 4, 2024

There is a chance we ignored deduplication to save time. But that's 100% not internal [binary] format restriction (don't remember about lvgl).

@kisvegabor
Copy link
Member

In LVGL we can also reference any bitmap_index for a glyph. See

 {.bitmap_index = 1307, .adv_w = 128, .box_w = 8, .box_h = 8, .ofs_x = 0, .ofs_y = -1},

pavmick added a commit to pavmick/lv_font_conv that referenced this issue Sep 5, 2024
In LVGL export format, avoid storing duplicate bitmap data for identical glyphs. Instead, reference existing bitmap data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants