-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word splitting in non-Latin text, and over ligatures #3
Comments
Any ligatures would likely have to be backed out, at least for word-splitting purposes, and then put back in if the word wasn't split through a ligature. If it was, the fragment on either side might contain a shorter ligature, requiring HarfBuzz::Shaper to be called again, against both fragments. On the bright side, it's unlikely that decomposing a ligature into letters, or vice-versa, will change the length of the word sufficiently to require another pass with Text::KnuthPlass. It should be a small enough change that the ratio (affecting glue length) could just be updated.
What if HarfBuzz::Shaper was called after Text::KnuthPlass? This might be feasible if ligatures are the only thing in play (no direction or alphabet changes, no font size changes, etc.). Presumably the substitution of ligatures (after the lines are already split) would just entail a small update to line ratios, to get back the desired alignment. This might not be the case for complex scripts such as Arabic or Indic languages, where glyph substitutions for various kinds of ligatures could entail substantial length changes. |
Note that #2 is concerned more with word splitting on Latin text for non-English text, but still applies quite a bit to this ticket's area of interest, so be sure to look at both tickets when doing something regarding word and line splitting. Keep in mind that the only reason to worry about splitting a word is that a line needs to be split, and the best fit may be through a word (hyphenation, etc.). |
The PDF::Builder package can typeset using HarfBuzz::Shaper to substitute a font's ligatures for sequences of lowercase letters. It does not currently natively call Text::KnuthPlass, but I plan to add this in the near future. Some potential problems arise when Harfbuzz::Shaper is used, and decides it wants to substitute some ligatures. This will mean that Text::KnuthPlass will have to accept not just plain text, but also the HarfBuzz arrays of processed glyphs, which could include ligatures. How this will interact with word-splitting (patterns and exceptions assuming no ligatures) remains to be seen. We also need to think about word-splitting with connected cursive scripts such as Arabic, and highly processed complex scripts such as Devanagari or Khmer, not to mention bi-directional (RTL) scripts, and mixtures of different types.
The text was updated successfully, but these errors were encountered: