Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

Issues with implementation of African scripts presented at IUC44 #40

Open
NeilSureshPatel opened this issue Jan 12, 2021 · 13 comments
Open
Labels
encoding The Unicode Standard, etc fyi shaping

Comments

@NeilSureshPatel
Copy link

https://block-image-uploader-prod.s3.us-west-2.amazonaws.com/f9c4abee-50a9-4f0c-8a92-757059c278d7/S8T3-Patel.pdf


@lianghai lianghai added encoding The Unicode Standard, etc fyi shaping labels Jan 14, 2021
@n8willis
Copy link

This link is giving me an "Accessed Denied" error. Is it geo-locked or anything? Just moved?

@NeilSureshPatel
Copy link
Author

I moved the file to a new location. Try this link.

IUC44 S8T3

@n8willis
Copy link

Perfect; thanks!

@tiroj
Copy link

tiroj commented Jan 22, 2021

Related: slides from a presentation on supporting African alphabets using the Latin script.

These slides are from a presentation I have to Microsoft and Monotype teams in 2008, based on research I had done at the African Studies Library at Northwestern University in 1998 as part of the WRIT project. The presentation informed some of the decisions in the making of Microsoft’s Ebrima fonts for African languages. There’s no accompanying text, but the slides mostly speak for themselves. I might revise some of the production advice now, e.g. would be more likely to use mark positioning for cedilla diacritics.

@NeilSureshPatel
Copy link
Author

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks. This does seems to make it easier to bring in mark variants with less vertical height that work better in stacks. Regarding the last slide, there are few other regional form preferences. Rather than the rounded bottom vhook/Vhook a more angular form is preferred for Toma. For Ga the Esh and Ezh forms that resemble a Greek Sigma and Greek Sigma reversed, respectively, are preferred. In Liberia, there is still a preference for a Bhook that looks like a Cyrillic Be.

@NeilSureshPatel
Copy link
Author

Is it correct to say that the only way a system knows if a font has the appropriate support for a given language is based on codepoint coverage? I am thinking about certain letter/diacritic combinations that are required for specific languages, eg. e/acute/dotbelow for Yorùbá. This combo is not encoded. If a font has both combining diacritics and the base letter then it seems it qualifies as being able to support Yorùbá. However, it would be nice to know if the base letter actually had the anchors needed to create the combination before allowing it in a font stack. This might be too complicated or require too much overhead to implement but would be an interesting way to ensure proper rendering of languages that need un-encoded accented letter combinations.

@dscorbett
Copy link

Even that heuristic wouldn’t work: it is possible to support such accented letter combinations without using mark anchors.

@NeilSureshPatel
Copy link
Author

Even that heuristic wouldn’t work: it is possible to support such accented letter combinations without using mark anchors.

Without mark anchors, does this alternate method still require something at the font level or is it something that is implemented on the rendering level?

@dscorbett
Copy link

I was thinking of ways within the font.

@tiroj
Copy link

tiroj commented Jan 22, 2021

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks.

Alas, Adobe’s shaping engines still don’t handle this properly, so I’ve given up on that method even though it works fine with Microsoft, Apple, and HarfBuzz engines. In recent fonts, I am propagating mark and mkmk positioning to diacritic glyphs using our vfj-propagate-anchors script.

@NeilSureshPatel
Copy link
Author

The encoding and diacritic display recommendations look pretty good. In particular the last one regarding contextually decomposing precomposed diacritics when followed by stacked marks.

Alas, Adobe’s shaping engines still don’t handle this properly, so I’ve given up on that method even though it works fine with Microsoft, Apple, and HarfBuzz engines. In recent fonts, I am propagating mark and mkmk positioning to diacritic glyphs using our vfj-propagate-anchors script.

I just did a quick test in InDesign with a font in which I am decomposing precomposed marks. It seems to work but the output is interesting. I am typing with the Unicode Hex Input on my Mac. I have entered the strings in two ways one that starts with just the base letter followed by combining marks and then again starting with the encoded aacute followed by additional combining marks.

image

Here are the glyph names extracted from the strings. Interestingly, in the second case, the aacute is not being decomposed but the stacking still works. I don't know if this is because I am swapping out the acute with a flattened form. Oddly the font doesn't have an aacute precomposed with a flattened acute. That being said the acute in the aacute is a component of acutecomb which does have anchors for stacking. Perhaps this effectively works the same as propagating the mark and mkmk positioning to diacritic glyphs.

/a /a/acutecomb /a/acutecomb/tildecomb /a/acutecomb/tildecomb/gravecomb
/a /aacute /aacute/tildecomb /aacute/tildecomb/gravecomb

I am curious under what conditions you have seen this to fail in Adobe apps.

@NeilSureshPatel
Copy link
Author

I was thinking of ways within the font.

I think the concern I have has less to do with the font, since when someone intentionally builds a font to properly support a language then anchors work well. I am more concerned about the fact that fonts with the requisite glyphs to support a language are considered to be valid for displaying that language even if functionally they are not. Currently, there does not seem to be a way to know that when someone defines a font stack online.

@tiroj
Copy link

tiroj commented Jan 22, 2021

Fonts like Cambria and the current release version of the Brill types perform contextual decomposition of precomposed diacritics in the ccmp feature when the diacritic is followed by a combining mark. This never worked reliably in InDesign: something that I reported as a bug more than 15 years ago. Results differed between the Adobe composers, but neither got it right. It took a while working with Adobe engineers to figure out what was going wrong, but it seemed that in at least some cases the shaping engine was recomposing the precomposed diacritic based on character mapping after ccmp processing. There were partial improvements in the non-WRC composer recently, but still not perfect, and in any case our clients needed to be able to have such sequences in the same paragraph as Arabic and other complex scripts, so had to use WRT.

It is only with HarfBuzz access that these fonts are behaving better in InDesign, but in the meantime I’ve changed my OTL model and tools to bypass the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
encoding The Unicode Standard, etc fyi shaping
Projects
None yet
Development

No branches or pull requests

5 participants