Issues with implementation of African scripts presented at IUC44 #40

NeilSureshPatel · 2021-01-12T21:15:51Z

https://block-image-uploader-prod.s3.us-west-2.amazonaws.com/f9c4abee-50a9-4f0c-8a92-757059c278d7/S8T3-Patel.pdf 

n8willis · 2021-01-21T18:41:05Z

This link is giving me an "Accessed Denied" error. Is it geo-locked or anything? Just moved?

NeilSureshPatel · 2021-01-21T21:07:08Z

I moved the file to a new location. Try this link.

IUC44 S8T3

n8willis · 2021-01-22T12:01:36Z

Perfect; thanks!

tiroj · 2021-01-22T17:52:56Z

Related: slides from a presentation on supporting African alphabets using the Latin script.

These slides are from a presentation I have to Microsoft and Monotype teams in 2008, based on research I had done at the African Studies Library at Northwestern University in 1998 as part of the WRIT project. The presentation informed some of the decisions in the making of Microsoft’s Ebrima fonts for African languages. There’s no accompanying text, but the slides mostly speak for themselves. I might revise some of the production advice now, e.g. would be more likely to use mark positioning for cedilla diacritics.

NeilSureshPatel · 2021-01-22T18:57:17Z

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks. This does seems to make it easier to bring in mark variants with less vertical height that work better in stacks. Regarding the last slide, there are few other regional form preferences. Rather than the rounded bottom vhook/Vhook a more angular form is preferred for Toma. For Ga the Esh and Ezh forms that resemble a Greek Sigma and Greek Sigma reversed, respectively, are preferred. In Liberia, there is still a preference for a Bhook that looks like a Cyrillic Be.

NeilSureshPatel · 2021-01-22T19:14:28Z

Is it correct to say that the only way a system knows if a font has the appropriate support for a given language is based on codepoint coverage? I am thinking about certain letter/diacritic combinations that are required for specific languages, eg. e/acute/dotbelow for Yorùbá. This combo is not encoded. If a font has both combining diacritics and the base letter then it seems it qualifies as being able to support Yorùbá. However, it would be nice to know if the base letter actually had the anchors needed to create the combination before allowing it in a font stack. This might be too complicated or require too much overhead to implement but would be an interesting way to ensure proper rendering of languages that need un-encoded accented letter combinations.

dscorbett · 2021-01-22T19:32:32Z

Even that heuristic wouldn’t work: it is possible to support such accented letter combinations without using mark anchors.

NeilSureshPatel · 2021-01-22T19:39:03Z

Even that heuristic wouldn’t work: it is possible to support such accented letter combinations without using mark anchors.

Without mark anchors, does this alternate method still require something at the font level or is it something that is implemented on the rendering level?

dscorbett · 2021-01-22T19:41:20Z

I was thinking of ways within the font.

tiroj · 2021-01-22T19:45:49Z

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks.

Alas, Adobe’s shaping engines still don’t handle this properly, so I’ve given up on that method even though it works fine with Microsoft, Apple, and HarfBuzz engines. In recent fonts, I am propagating mark and mkmk positioning to diacritic glyphs using our vfj-propagate-anchors script.

NeilSureshPatel · 2021-01-22T20:45:56Z

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks.

Alas, Adobe’s shaping engines still don’t handle this properly, so I’ve given up on that method even though it works fine with Microsoft, Apple, and HarfBuzz engines. In recent fonts, I am propagating mark and mkmk positioning to diacritic glyphs using our vfj-propagate-anchors script.

I just did a quick test in InDesign with a font in which I am decomposing precomposed marks. It seems to work but the output is interesting. I am typing with the Unicode Hex Input on my Mac. I have entered the strings in two ways one that starts with just the base letter followed by combining marks and then again starting with the encoded aacute followed by additional combining marks.

Here are the glyph names extracted from the strings. Interestingly, in the second case, the aacute is not being decomposed but the stacking still works. I don't know if this is because I am swapping out the acute with a flattened form. Oddly the font doesn't have an aacute precomposed with a flattened acute. That being said the acute in the aacute is a component of acutecomb which does have anchors for stacking. Perhaps this effectively works the same as propagating the mark and mkmk positioning to diacritic glyphs.

/a /a/acutecomb /a/acutecomb/tildecomb /a/acutecomb/tildecomb/gravecomb
/a /aacute /aacute/tildecomb /aacute/tildecomb/gravecomb

I am curious under what conditions you have seen this to fail in Adobe apps.

NeilSureshPatel · 2021-01-22T20:52:43Z

I was thinking of ways within the font.

I think the concern I have has less to do with the font, since when someone intentionally builds a font to properly support a language then anchors work well. I am more concerned about the fact that fonts with the requisite glyphs to support a language are considered to be valid for displaying that language even if functionally they are not. Currently, there does not seem to be a way to know that when someone defines a font stack online.

tiroj · 2021-01-22T20:57:57Z

Fonts like Cambria and the current release version of the Brill types perform contextual decomposition of precomposed diacritics in the ccmp feature when the diacritic is followed by a combining mark. This never worked reliably in InDesign: something that I reported as a bug more than 15 years ago. Results differed between the Adobe composers, but neither got it right. It took a while working with Adobe engineers to figure out what was going wrong, but it seemed that in at least some cases the shaping engine was recomposing the precomposed diacritic based on character mapping after ccmp processing. There were partial improvements in the non-WRC composer recently, but still not perfect, and in any case our clients needed to be able to have such sequences in the same paragraph as Arabic and other complex scripts, so had to use WRT.

It is only with HarfBuzz access that these fonts are behaving better in InDesign, but in the meantime I’ve changed my OTL model and tools to bypass the issue.

lianghai added encoding The Unicode Standard, etc fyi shaping labels Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with implementation of African scripts presented at IUC44 #40

Issues with implementation of African scripts presented at IUC44 #40

NeilSureshPatel commented Jan 12, 2021

n8willis commented Jan 21, 2021

NeilSureshPatel commented Jan 21, 2021

n8willis commented Jan 22, 2021

tiroj commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

dscorbett commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

dscorbett commented Jan 22, 2021

tiroj commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

tiroj commented Jan 22, 2021

Issues with implementation of African scripts presented at IUC44 #40

Issues with implementation of African scripts presented at IUC44 #40

Comments

NeilSureshPatel commented Jan 12, 2021

n8willis commented Jan 21, 2021

NeilSureshPatel commented Jan 21, 2021

n8willis commented Jan 22, 2021

tiroj commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

dscorbett commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

dscorbett commented Jan 22, 2021

tiroj commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

NeilSureshPatel commented Jan 22, 2021

tiroj commented Jan 22, 2021