Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store font's Unicode ranges #3

Open
RazrFalcon opened this issue Jun 22, 2020 · 18 comments
Open

Store font's Unicode ranges #3

RazrFalcon opened this issue Jun 22, 2020 · 18 comments

Comments

@RazrFalcon
Copy link
Owner

No description provided.

@jackpot51
Copy link
Contributor

Hello @RazrFalcon, is there any news on this? I am looking for a pure Rust library to use for font fallback, and identifying supported ranges will be critical. By the way, I really love rustybuzz!

@RazrFalcon
Copy link
Owner Author

RazrFalcon commented Oct 7, 2022

Well, it's "simply" a matter of adding this feature to ttf-parser first. I haven't looked into it much.
See https://learn.microsoft.com/en-us/typography/opentype/spec/os2#ur

Will look tomorrow.

What API do you expect? A simple bitflags-like one?

By the way, I really love rustybuzz!

Yeah, 8 months well spent... It's a bit dead now, though it should work just fine.

@jackpot51
Copy link
Contributor

Thanks for looking in to this, it looks to me like a u128 bitflag will cover all ranges. If you need help implementing this, I could take it on as well.

@RazrFalcon
Copy link
Owner Author

I'm not sure how portable u128 is. Will see.
I'll try to look into it today. Doesn't seems to be that complex.

@RazrFalcon
Copy link
Owner Author

As someone who also interested in font fallback library, it seems like OS/2 Unicode ranges property is not enough. One would have to support the meta table as well, since it supports more ranges.

See OS/2 comment:

All available bits were exhausted as of Unicode 5.1. The bit assignments were last updated for OS/2 version 4 in OpenType 1.5. There are many additional ranges supported in the current version of Unicode that are not supported by these fields in the OS/2 table. See the 'dlng' and 'slng' tags in the 'meta' table for an alternate mechanism to declare what scripts or languages that a font can support or is designed for.

And meta uses comma-separated BCP 47 language tags.
So while I can implement contains_char for OS/2 fairly easily, doing so for meta would be way harder. Mainly because one would have to use BCP 47 library for parsing/conversion.

PS: I do have a rudimentary font fallback algorithm in one of my libraries, but it simply checks if a font has a glyph for the specified character. Which is obviously a bit slow.

@RazrFalcon
Copy link
Owner Author

Checkout RazrFalcon/ttf-parser@2b0e0e5

Currently, the only API available is face.unicode_ranges().contains_char('A')

If you're looking for something else, let me know.

@jackpot51
Copy link
Contributor

Awesome, looking good so far! I am hoping to be able to use fontdb to query for fonts that support a range, for fallback.

@RazrFalcon
Copy link
Owner Author

What do you mean by range? Range<char> or a named Unicode range (like Latin-1 Supplement).

I can add ttf_parser::UnicodeRange to fontdb::FaceInfo, but I'm worry it would blow up the memory usage. Its u128 after all. But I don't see another way.

There are also Unicode blocks, which have 327 variants instead of 123 in OS/2.

So I would repeat my question again: what exactly do you want. The problem is not in implementing it, but actually understating what information is required.
And I assume that looping through all fonts and calling fontdb::Source::with_data is not an option for performance reasons.

@jackpot51
Copy link
Contributor

Storing the u128 for every font is probably necessary to fully implement font fallback. An alternative would be to only store a single font id for the first detected font that inplements that range. I don't really have opinions on the implementation, other than to say, it is nice to be able to search for a font that provides a specific unicode codepoint for font fallback.

@RazrFalcon
Copy link
Owner Author

RazrFalcon commented Oct 9, 2022

The only real way to do so is to call ttf_parser::Face::glyph_index. Everything else is just a hint. No to mention that Unicode ranges from OS/2 are for Unicode 5, so emojis would probably fail.
So after checking the Unicode range, you would have to test for glyph ID anyway.

fontdb intentionally doesn't provide a font fallback. Mainly because this is a very complex, expensive and unspecified task. It can be implemented in way too many ways.

My suggestion would be to handle it on your side. Call fontdb::Source::with_data for each font you have. Collect Unicode ranges. Maybe even characters themselves via ttf_parser::cmap::Subtable::codepoints. Then query the character manually, caching the result. Everything else would be way too slow.
Not to mentions one would have to check for font styles as well. Like if you're looking a fallback for a bold font, you have to skip non-bold font on the first run or altogether.
And the list goes forever...

@jackpot51
Copy link
Contributor

I will give that a try.

@jackpot51
Copy link
Contributor

Would there be interest in fontdb being able to return multiple fonts from a query?

@dhardy
Copy link
Contributor

dhardy commented Oct 10, 2022

I implemented basic font fallback in kas-text. The problems are a little more complicated than just querying supported glyphs:

  1. Assemble a list of fallback fonts per style. Fontdb is useful but didn't provide the functionality I needed, so I wrote a wrapper.
  2. Match possible font face(s) per character, attempting to avoid frequent changes (e.g. don't change back to the first font when reaching space or punctuation). My implementation is quite lazy on this aspect: use the last-used font face if possible, otherwise restart fallback from the first available face (this at least solves the most obvious issues).
  3. Make this fast, which probably involves multiple levels of caches.
  4. Break text into runs of a single font and feed into a shaper (Harfbuzz or whatever).

There is a lot more complexity, but you get the idea. If I were to start over, I would take a closer look at swash.

@RazrFalcon
Copy link
Owner Author

@jackpot51 Which query? The current one should return just one font. That's the point. Maybe you want some sort of filtering instead?

@RazrFalcon
Copy link
Owner Author

@dhardy I haven't looked into how swash handles fallback yet. But it has a completely different ideology to my libraries. It is strictly a monolith, while my libraries are quite independent/modular. There are pros and cons to both designes.

As for fontdb, it's just the code I've extracted from resvg. It's minimal by design. As per readme.
I do plan on implementing a proper text layout library eventually, but I simply don't have the time.

@dhardy
Copy link
Contributor

dhardy commented Oct 10, 2022

@RazrFalcon I believe he is talking about the equivalent of this query (i.e. given font family, weight etc., return a listing of all possible matches). Note though that this does not use the raw font list of all available fonts on the system, but a (badly) curated list of a few fonts in a preferred order.

I'm not sure myself if this functionality should be in fontdb. Maybe another small library.

It is strictly a monolith, while my libraries are quite independent/modular.

👍

@RazrFalcon
Copy link
Owner Author

Oh, I see. Yes, as mentioned above, the reason fontdb is so spartan, is because all other features can be implemented in way too many ways.

Sure, fontdb can provide the list fonts with the specified char function. But as you said, one may want to use their own list of fonts. Others would want to cache it in some way. And so on.

Honestly, forking fontdb is the solution in most cases.

@jackpot51
Copy link
Contributor

I have decided to use fontdb for loading a list of system fonts, but then implement my own fallback on top of that. It will require no changes to fontdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants