Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better support for mixed writing systems #762

Open
4 of 6 tasks
jwiggins opened this issue Mar 26, 2021 · 3 comments
Open
4 of 6 tasks

Provide better support for mixed writing systems #762

jwiggins opened this issue Mar 26, 2021 · 3 comments
Labels
difficulty: advanced ETS Backlog Good issue for ETS team members to look at type: enhancement

Comments

@jwiggins
Copy link
Member

jwiggins commented Mar 26, 2021

This is mainly a problem of the AGG backends. Quartz and QPainter backends already have the correct behavior.

Basically, user code should be able to call show_text on the following string: "Kiva Graphics一番😎" and have it render correctly even if the currently selected font only supports Latin characters.

To get to this point, we need to do a few things:

  • Collect writing system information for entries in our font database
  • Build fallback lists for font families and styles
  • Find a library to use, or failing that, write our own function which breaks a string up into chunks which share the same writing system (see: https://stackoverflow.com/questions/9868792/find-out-the-unicode-script-of-a-character)
  • Make low level text drawing functions return the text cursor position after drawing a run of glyphs (or work around the absence by calling get_text_extent on every chunk of a string before drawing)
  • Bring everything together in the show_text method so that mixed strings can be drawn
  • Bonus: Support bidirectional text mixing

This is roughly what Qt does, based on a quick skim of the code: https://code.qt.io/cgit/qt/qtbase.git/

  • QFreeTypeFontDatabase::addTTFile (qtbase.git/tree/src/gui/text/freetype/qfreetypefontdatabase.cpp):
    Scans a font for the following information: weight, style, fixed-width, supported writing systems (unicode range, codepage range), family name
  • QPlatformFontDatabase::fallbacksForFamily (qtbase.git/tree/src/gui/text/qfontdatabase.cpp):
    Takes a style and script ID and returns a list of fonts which support that script with that style (or just support the script)
  • QPainter::drawText (qtbase.git/tree/src/gui/painting/qpainter.cpp): Basically Qt's show_text.
    Uses QStackTextEngine for shaping, breaking of input string. Breaks into QScriptItem objects. Picks the font per item and draws it.
  • QStackTextEngine/QScriptItem/QTextItemInt (qtbase.git/tree/src/gui/text/qtextengine.cpp)
    These are the components which break up a string into chunks which can be shaped and drawn as a unit.
@jwiggins
Copy link
Member Author

@jwiggins
Copy link
Member Author

jwiggins commented Apr 6, 2021

Copying from #767 so it's easier to find:

Having played with [mapping of "Han" to a CJK language] a bit more, we should only use [the locale-based guess] when it's not otherwise clear from the context. For instance if a string already contains Hiragana or Katakana, then Han should be mapped to "Japanese". If Hangul is encountered, Han maps to "Korean". Only if the Han is mixed with some non-CJK language should we fall back to this locale-based guess.

@jwiggins
Copy link
Member Author

Consider libgrapheme or utf8proc for classifying graphemes in a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: advanced ETS Backlog Good issue for ETS team members to look at type: enhancement
Projects
None yet
Development

No branches or pull requests

1 participant