Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining diacritics from diacritics already in fonts #17

Merged
merged 36 commits into from
Jan 3, 2016

Conversation

moyogo
Copy link
Contributor

@moyogo moyogo commented Sep 13, 2015

No description provided.

…flexcmb, dieresiscmb, dotaccentcmb, gravecmb, hungarumlautcmb, macroncmb, ogonekcmb, ringcmb, tildecmb

These combining marks are using spacing or legacy diacritics as components. It might be better the other way around.
…s components

* Rename acute.cap, caron.cap, circumflex.cap, grave.cap to acutecmb.cap, caroncmb.cap, circumflexcmb.cap, gravecmb.cap
* Adjust their advance width to zero
* Add groups @CMB and @cmbcap and substition after @uppercase in 'calt'
* Rename components in other glyphs
@moyogo moyogo force-pushed the latext/diacritics branch 3 times, most recently from 86aef7f to df72fc1 Compare September 13, 2015 13:01
@weiweihuanghuang
Copy link
Owner

Thank you Denis! By missing anchors, I assume there's a standard on what anchors each glyph requires–is there some resource I can refer to in the future? Of course Glyphs App itself has it's own GlyphData.xml but I'd still like to a source for reference.

@moyogo
Copy link
Contributor Author

moyogo commented Sep 13, 2015

No there is no standard per se on what anchors each glyph requires.
Unicode says any base letter can be combined with any diacritic. For Latin that means the letter characters and the “common” diacritics shared with Greek, Cyrillic and in some cases others scripts, but other symbols can also be used with diacritics.

I keep a list for African orthographies with the Latin alphabet in https://github.com/moyogo/anloc-data. I also have an unpublished list for other languages.
The Adobe also has multiple Latin sets http://blog.typekit.com/2008/08/28/extended_latin/

If you venture in phonetic transcriptions or historical orthographies there are even more combinations.

The bottom line is: you might as well assume any base character can be combined with any diacritic.

Which brings me to a couple of question I have for how you’d like things to go.

  1. Can I add the bottomC (cedilla) anchor and the ogonek anchor to all the base letters?
  2. Some North American languages seem to prefer a straight and centered ogonek and some languages use the letters with cedilla (classic cedilla and comma cedilla) but should have only one shape. Is is OK if I add variants and features to access these as well.
  3. You have a specific acute for the letter with ascender lacute 013A instead of using the acute or acute.cap (acutecmb and acutecmb.cap in this branch). That means a full set of acutecmb.asc, etc. would need to be designed to go with letters with ascenders. Would you consider using acutecmb.cap directly (replacing the current acute in lacute) or as a composite in acutecmb.asc (for positioning)?

@davelab6
Copy link

Cool. This sounds like something I should be standardising in all libre fonts. Do you agree?

googlefonts/gf-docs@9b1c6bd

@moyogo
Copy link
Contributor Author

moyogo commented Sep 13, 2015

@davelab6 Yes, I agree.

A good starting point is actually to use the combining diacritics characters instead of the spacing or legacy diacritics to build the precomposed accented characters (those in Unicode). That way you can easily extend the character set and you support combining with combining diacritic.

Just to make sure there no misunderstanding, the main point of these anchors is to end up in the 'mark' feature in GPOS, not to build all possible combinations as precomposed glyphs (precomposed characters in Unicode, ex: e+cedillacmb, and combination in Unicode as character sequences, ex: q+acutecmb).

@davelab6
Copy link

davelab6 commented Sep 13, 2015 via email

@davelab6
Copy link

davelab6 commented Sep 13, 2015 via email

@moyogo
Copy link
Contributor Author

moyogo commented Sep 13, 2015

Yes, the precomposed character should still be there. I meant to say not all possible combination should be a a precomposed glyph. I would not include precomposed glyphs for q́, q̀, q̂, q̌, q̈, etc. in a font but they can still be composed with the font.

@weiweihuanghuang
Copy link
Owner

Thanks for answering and your continued contributions!

Can I add the bottomC (cedilla) anchor and the ogonek anchor to all the base letters?

Yes.

Some North American languages seem to prefer a straight and centered ogonek and some languages use the letters with cedilla (classic cedilla and comma cedilla) but should have only one shape. Is is OK if I add variants and features to access these as well.

If you think it's appropriate, sure. But I don't understand what you mean here:

Does some languages use the letters with cedilla mean that a letter such as (I'm making this up) yogonek ends up using a cedilla instead?
And but should have only one shape meaning they need to be consistent or?
What's a straight and centered ogonek?

You have a specific acute for the letter with ascender lacute 013A instead of using the acute or acute.cap (acutecmb and acutecmb.cap in this branch). That means a full set of acutecmb.asc, etc. would need to be designed to go with letters with ascenders. Would you consider using acutecmb.cap directly (replacing the current acute in lacute) or as a composite in acutecmb.asc (for positioning)?

I tried using the acutecmb.cap and I think it's too tall. Can we not just taking the acute in the lacute and creating a acutecmb.asc, what is the etc?

@moyogo
Copy link
Contributor Author

moyogo commented Sep 14, 2015

Does some languages use the letters with cedilla mean that a letter such as (I'm making this up) yogonek ends up using a cedilla instead?
Andbut should have only one shape meaning they need to be consistent or?

Marshallese uses Ļ ļ M̧ m̧ Ņ ņ O̧ o̧. Because of the preferred comma shaped cedilla Ļ ļ Ņ ņ have a comma shaped cedilla. But M̧ m̧ O̧ o̧ have the classic cedilla. It would be best if the locl feature or an optional stylistic feature would make them consistent indeed.

What's a straight and centered ogonek?

It’s something the fonts on Navajo resources are doing:

screen shot 2015-09-14 at 18 23 40
screen shot 2015-09-14 at 18 24 00
screen shot 2015-09-14 at 18 24 15

@moyogo
Copy link
Contributor Author

moyogo commented Sep 14, 2015

Can we not just taking the acute in the lacute and creating a acutecmb.asc, what is the etc?

Yes, that’s fine as well. The other diacritics can also go above letters with ascender. For example circumflex is used on h in Esperanto, grave on f, t, k in ISO 9 romanization of cyrillic, caron on h in Lakota or Romani in Finland, dieresis on h in Kurmanji, macron on l in Votic, tilde on l in Lithuanian, dot above was used on b or d in Irish. There are some romanization or historical orthographies using breve and ring above letters with ascenders as well. I’m not aware of double acute on ascender.

@weiweihuanghuang
Copy link
Owner

Yes, that’s fine as well. The other diacritics can also go above letters with ascender. For example circumflex is used on h in Esperanto, grave on f, t, k in ISO 9 romanization of cyrillic, caron on h in Lakota or Romani in Finland, dieresis on h in Kurmanji, macron on l in Votic, tilde on l in Lithuanian, dot above was used on b or d in Irish. There are some romanization or historical orthographies using breve and ring above letters with ascenders as well. I’m not aware of double acute on ascender.

I see, I can add .asc versions where appropriate too. Should it conform to the 125% of UPM max height?

@weiweihuanghuang
Copy link
Owner

Marshallese uses Ļ ļ M̧ m̧ Ņ ņ O̧ o̧. Because of the preferred comma shaped cedilla Ļ ļ Ņ ņ have a comma shaped cedilla. But M̧ m̧ O̧ o̧ have the classic cedilla. It would be best if the locl feature or an optional stylistic feature would make them consistent indeed.

Even if those glyphs are made with a ̧ 0327 COMBINING CEDILLA, why do L l N n default to the comma cedilla? and M m O o to the classical cedilla?

@moyogo
Copy link
Contributor Author

moyogo commented Sep 20, 2015

I see, I can add .asc versions where appropriate too. Should it conform to the 125% of UPM max height?

If that’s what you used for lacute, yes.
I think lacute is the tallest reaching 930 and descenders or diacritics below go just below 210. The hhea, typo and win metrics each add up to 1173, so 117.3%. If you’re planning on supporting Vietnamese at some point, 125% or something around that would be better.

Even if those glyphs are made with a ̧ 0327 COMBINING CEDILLA, why do L l N n default to the comma cedilla? and M m O o to the classical cedilla?

Around the end of the 19th century the comma below was a common shape for the cedilla in fonts.
When Latvian orthography started using the cedilla, it was common to see it with either the comma or classic cedilla and eventually the comma shape became the most common. The same thing happened with Romanian.
By the time character encodings were made, the G K L N R cedilla were encoded for Latvian as letters with cedilla but with the most common shape. For Romanian, there was S T cedilla, but they had a classic cedilla (as S cedilla is used in Turkish). Unicode did not differentiate those characters G K L N R S T with cedilla from any with comma but has a combining comma separate from the combining cedilla. This made the cedilla ambiguous, it can have both shapes, while the combining comma below can only have one shape.
The Romanian standard association eventually asked for separate S T comma below and they were encoded in ISO 8859-16 and Unicode. Shortly after that Unicode stopped encoding precomposed characters.

So, the cedilla can have several shapes. In Latvian the preferred shape is now the comma cedilla under Latvian letters. In Romanian, separate characters were created so the comma below diacritic can be used. In Marshallese or some other context, the cedilla should have a single shape.

I’ve finished adding anchors.

@weiweihuanghuang
Copy link
Owner

Thanks for the information! I noticed the anchors in the Black masters for .asc glyphs are wrong (the asc height of the Black master changes). I'm currently adding the .asc versions of diacritics.

@moyogo
Copy link
Contributor Author

moyogo commented Sep 20, 2015

Cool. I’m fixing the top anchor on those ascenders setting them all to 730.
I noticed I missed adding anchors to j. I’ll also add jdotless.

@moyogo
Copy link
Contributor Author

moyogo commented Sep 23, 2015

FYI about the ogonek: adobe-fonts/source-sans#75

@weiweihuanghuang
Copy link
Owner

BTW if you are going to move anchors on any base glyphs you need to disable automatic alignment on the related diacritic glyphs. I.e. you moved the bottom anchor on T, then the commaccent in Tcommaccent will be moved to a new position that is not what I intended.

Many of the diacritics have moved out of place now, I don't know of any easy way to go through and place the components in the correct place again. cc @schriftgestalt @mekkablue ?

@schriftgestalt
Copy link

Why would you change the anchors in a way that would make the Tcommaaccent look bad?

@moyogo
Copy link
Contributor Author

moyogo commented Sep 23, 2015

Why would you change the anchors in a way that would make the Tcommaaccent look bad?

Good question :-)

There should probably be a different anchor for bottom anchors like dot below, macron below, circumflex below. These should be centered on the stem of T instead.

I see I need to also realign and disable automatic alignment of the grave and double acute on ÀÌÒÙŰ in two masters.

@weiweihuanghuang
Copy link
Owner

Because the other anchors look better that way. And then if I changed the anchor in the commaaccent itself it wouldn't look balanced elsewhere and if I balanced those other ones then it threw more off. I find some diacritics don't work with a single anchor.

On 23 Sep 2015, at 10:37 pm, Georg notifications@github.com wrote:

Why would you change the anchors in a way that would make the Tcommaaccent look bad?


Reply to this email directly or view it on GitHub.

@moyogo
Copy link
Contributor Author

moyogo commented Sep 23, 2015

What should I do when symmetric diacritics are positioned differently on the same base letter or similar diacritics are positioned differently on a symmetric base letter.

Compare Ecircumflex and Ecaron in master and in this branch:
screen shot 2015-09-23 at 22 01 02
In master the circumflex and the caron have different horizontal offsets.
screen shot 2015-09-23 at 22 07 39
In this branch the circumflex and the caron have the same horizontal offset.

Compare Igrave and Icircumflex in master and in this branch:
screen shot 2015-09-23 at 22 08 53
In master, the grave is more the left and and the acute is more centered.
screen shot 2015-09-23 at 22 08 40
In this branch, the grave and the acute are symmetrically centered on top of the I.

Do you want me to realign these as in master?

@schriftgestalt
Copy link

That all looks like a mistake. And I didn’t see a case where the marks that where positioned properly in the I (uppercase i), would need a different horizontal position on any of the wider letters. So center the anchors in the I and position them in the marks that they look good in on the I. Then you can position the anchors in all other letter.

@weiweihuanghuang
Copy link
Owner

I noticed now just quickly in the Regular master, grave now is too far right, noticeable more here:
screen shot 2015-10-11 at 12 52 22 pm

I'm going through and using manual alignment for some of these.

@weiweihuanghuang
Copy link
Owner

Cool. I’m fixing the top anchor on those ascenders setting them all to 730.

Why 730? it should be 700, I'm changing them now too

@schriftgestalt
Copy link

The acute look fine for me. But if you don't like it, move The anchor in the grave

@weiweihuanghuang
Copy link
Owner

The acute look fine for me. But if you don't like it, move The anchor in the grave

Can't, it will move it everywhere else too!

@weiweihuanghuang
Copy link
Owner

@schriftgestalt
Copy link

The base of the acute/grave should not be centered above the glyph: http://diacritics.typo.cz/index.php?id=4

The placement on your I and A is exactly as it should be.

@weiweihuanghuang
Copy link
Owner

The base of the acute/grave should not be centered above the glyph: http://diacritics.typo.cz/index.php?id=4

Why not? [Edit] These guidelines don't say it's bad practise to center it:

Horizontal placing of acute may prove difficult: it should incline towards the right slightly, but at the same time it should not “fall off” the character. The more steep the acute, the closer the lower tip can be to the optical centre of the letter; the more horizontal it is, the more the whole accent needs to be optically centered above the letter. The angle of the acute to the vertical axis of the type should be the same as of grave. Because characters with acute are included in most western typefaces, there are many examples as how to draw it properly. If the weight of the acute stroke varies, it should narrow towards the bottom

Having looked on MyFonts I do see a lot of examples where the lower edge goes beyond the base glyph.

@moyogo
Copy link
Contributor Author

moyogo commented Oct 16, 2015

@moyogo new branch with your changes https://github.com/weiweihuanghuang/Work-Sans/tree/moyogo-diac

@weiweihuanghuang thanks. I rebased my branch on yours.
It looks good to me.
Is there anything else I can do?

@weiweihuanghuang
Copy link
Owner

@moyogo Great, I don't think so, I'll generate fonts and check that none of the custom TTFA hinting has changed. I'll also test the GPOS combos.

Btw do you know of a resource/tool where I can find what languages are now supported with these extended combining diacritics + anchors — or do you have an idea?

@moyogo
Copy link
Contributor Author

moyogo commented Oct 17, 2015

Btw do you know of a resource/tool where I can find what languages are now supported with these extended combining diacritics + anchors — or do you have an idea?

The CLDR has data to get such a list, but it’s stil a work-in-progress and many languages are missing or have incomplete data.
Comparing the font before and after doesn’t give any difference in the number of CLDR languages supported. On OS X, Font Book uses the CLDR data to list what languages are supported by a font.

You’ll get a much larger CLDR-language count if you add all the precomposed characters that use the current diacritics.

A bunch of languages that benefit from these combining marks also use missing characters (either precomposed character like ŵ, ỹ, etc. or additional characters like ɛ, ɔ, etc.).
I had started adding some additional characters used in African orthographies: https://github.com/moyogo/work-sans/tree/latext/ɔ; but I still have quite a few things to do.

@weiweihuanghuang weiweihuanghuang merged commit bdceb70 into weiweihuanghuang:master Jan 3, 2016
@davelab6
Copy link

davelab6 commented Jan 3, 2016 via email

@moyogo moyogo deleted the latext/diacritics branch August 11, 2016 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants