Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research the dieresis situation #109

Open
Tracked by #332
ForNeVeR opened this issue Jan 27, 2018 · 13 comments
Open
Tracked by #332

Research the dieresis situation #109

ForNeVeR opened this issue Jan 27, 2018 · 13 comments
Assignees
Labels
kind:documentation status:blocked For things blocked on other things.

Comments

@ForNeVeR
Copy link
Owner

ForNeVeR commented Jan 27, 2018

There's strange contradiction I even had to describe in the documentation:

How to know that O 177 is the same as code="196"? To do that, first look into cmmi10.vpl file: there's the following entry:

(CHARACTER O 177 (comment dieresis)
   (CHARWD R 583)
   (CHARHT R 705)
   (CHARIC R 118)
   (MAP
      (SETCHAR O 151)
      )
   )

That means that O 177 is named dieresis. Then, open Adobe Glyph List and search for the dieresis name:

dieresis;00A8

It means that O 177 should be character 0xa8 or 168, not 196. The reason for that contradiction is currently unknown.

We need to research that: what does code="196" mean and what's the reason the diaresis isn't 168?

We need to find the answer and write in into the documentation.

@ForNeVeR
Copy link
Owner Author

Typography library may produce support for the automation here.

@prepare
Copy link

prepare commented Jan 28, 2018

If you use .ttf file, I think the Typography may help you.

Feel free to ask/ create a new issue!

@B3zaleel
Copy link
Contributor

B3zaleel commented Sep 5, 2018

I thought it was strange as well but it isn't. If we try to render a character that's greater than an ushort, it would be an error, so we have to change where the position of a character to a free position that is less than an ushort's max value. That position may have held a character (say %) but it's code point now holds a new character. This is useful since a lot of Unicode math symbols are too large and for us to use them, that's what we have to do. You could use fontfotge to see the actual glyph, it's metrics, change the glyph and/or others and even what it's code point is meant to hold.

@Happypig375
Copy link

Happypig375 commented Sep 5, 2018

I think the issue is that the dieresis, which unicode codepoint is 168 (U+00A8), but its glyph index is 196. /cc @prepare

Edit: Oh, so using Typography is a solution but that's not done yet. I guess this is just a TODO issue then. Nvm.

@ForNeVeR
Copy link
Owner Author

ForNeVeR commented Sep 5, 2018

We don't know yet if Typography will help us with that case. Someone needs to research the situation.

@Happypig375
Copy link

I never had to deal with Typeface and Glyph issues because Typography handles all of the table readings in CSharpMath. I don't need any JSON or XML files, just the OTF font files are enough. Plus, integrated typeface importing on-the-go is also possible thanks to Typography. However, if you have other solutions, please do share them.

@prepare
Copy link

prepare commented Sep 6, 2018

Hello,

Do you have latest font?

Can you post the expected/actual result of a dieresis glyph from your app?


I downloaded the cm* font from (https://github.com/ForNeVeR/wpf-math/tree/master/src/WpfMath/Fonts).

And the analyze those fonts with latest Typography branch (https://github.com/LayoutFarm/Typography/tree/post_table_rev)


Font analysis

At this time. ...

  1. All fonts don't have 'kern' table.
    I read from this ... https://github.com/ForNeVeR/wpf-math/pull/108/files#diff-2dc54592d7800db71c597c416c9a29abR63 => I don't know how to get 'kern' data from them.

  2. Dieresis glyph. (from https://github.com/ForNeVeR/wpf-math/pull/108/files#diff-2dc54592d7800db71c597c416c9a29abR45)

cmmi dose not have a glyph name 'dieresis'

cmm_1

pic 1: cmmi font contains 134 glyphs


Only font that contains 'dieresis' is cmr10.ttf (see pic 2),
This font contains only 132 glyphs
cmm_2

pic 2: dieresis, in cmr10.ttf, glyphIndex=131

cmm_3

pic 3: compare this with Microsoft VisualTrueType, dieresis, in cmr10.ttf, glyphIndex=131,


@prepare
Copy link

prepare commented Sep 6, 2018

What happen when we post a dieresis(¨) into textbox


First, I will test it with tahoma font.

cmm_5

pic 4: Tahoma, dieresis, glyph index=142

The following gif shows step-by-step...


2018-09-06_07-41-01

pic 5: dieresis, codepoint 168 => cmap => glyph index = 142

@prepare
Copy link

prepare commented Sep 6, 2018

What happen when we post a dieresis(¨) into textbox with cmr10.ttf


At this time, As far as I know...

I don't know why the cmap => map to incorrect glyph...

2018-09-06_07-47-49

pic 6: cmr10.ttf, dieresis, codepoint 168 => cmap => incorect glyph ??

@prepare
Copy link

prepare commented Sep 6, 2018

That is my first report...

I also need to investigate more.

@ForNeVeR
Copy link
Owner Author

ForNeVeR commented Sep 6, 2018

@prepare thank you so much for your investigation and for documenting it here.

You've taken the latest font we have (we bundle the font from the repository in our NuGet package), that's right. Actually I can't remember why the dieresis took my attention. Probably it was the first character I've checked or something. Essentially I was looking for an automated way of mapping font characters to their codes in our XML.

Main problem is that we still don't know how these codes were generated, and we can't proceed with auto XML regeneration without that information.

@prepare
Copy link

prepare commented Sep 6, 2018

I think your XML was created from some complex 'Tex' tool.


At this time...

... That means that O 177 is named dieresis

Why dieresis is marked as O 177

We need to go to the original glyph definition of the cm font (https://ctan.org/tex-archive/fonts/cm/mf)

I think dieresis is here => http://mirror.hmc.edu/ctan/fonts/cm/mf/accent.mf (scroll to the end of the page)

cmchar "Umlaut (double dot) accent";
numeric dot_diam#,dot_diam;
dot_diam#=max(dot_size#,cap_curve#);
beginchar(oct"177",9u#,min(asc_height#,10/7x_height#+.5dot_diam#),0);
dot_diam=max(tiny.breadth,hround(max(dot_size,cap_curve)-2stem_corr));
italcorr h#*slant+.5dot_diam#-2.25u#;
adjust_fit(0,0);
pickup tiny.nib; pos1(dot_diam,0); pos2(dot_diam,90);
x1=x2=2.75u; top y2r=h+1;
if bot y2l<x_height+o+slab: y2l:=min(y2r-eps,x_height+o+slab+.5tiny); fi
y1=.5[y2l,y2r]; dot(1,2); % left dot
pos3(dot_diam,0); penpos4(y2r-y2l,90); y3=y4=y1; x3=x4=w-x1;
dot(3,4); % right dot
penlabels(1,2,3,4); endchar;

beginchar(oct"177",

At that time, it may be called 'Umlaut accent' or 'double dot' accent.

and then some tool map it to the name "dieresis" later.

@prepare
Copy link

prepare commented Sep 6, 2018

And ...

(CHARACTER C A
(CHARWD R 0.750002)
(CHARHT R 0.683332)
(COMMENT
(KRN O 177 R 0.138893)
)
)


I try to read the definition of 'A'
from http://mirror.hmc.edu/ctan/fonts/cm/mf/romanu.mf

% Character codes \0101 through \0132 are generated.
cmchar "The letter A";
beginchar("A",13u#,cap_height#,0);
adjust_fit(cap_serif_fit#,cap_serif_fit#);
numeric left_stem,right_stem,outer_jut,alpha;
right_stem=cap_stem-stem_corr;
left_stem=min(cap_hair if hefty: -3stem_corr fi,right_stem);
outer_jut=.8cap_jut; x1l=w-x4r=l+letter_fit+outer_jut+.5u; y1=y4=0;
x2-x1=x4-x3; x3r=x2r+apex_corr; y2=y3=h+apex_o+apex_oo;
alpha=diag_ratio(2,left_stem,y2-y1,x4r-x1l-apex_corr);
penpos1(alphaleft_stem,0); penpos2(alphaleft_stem,0);
penpos3(alpharight_stem,0); penpos4(alpharight_stem,0);
z0=whatever[z1r,z2r]=whatever[z3l,z4l];
if y0<h-cap_notch_cut: y0:=h-cap_notch_cut;
fill z0+.5right{down}...{z4-z3}diag_end(3l,4l,1,1,4r,3r)
--diag_end(4r,3r,1,1,2l,1l)--diag_end(2l,1l,1,1,1r,2r){z2-z1}
...{up}z0+.5left--cycle; % left and right diagonals
else: fill z0--diag_end(0,4l,1,1,4r,3r)--diag_end(4r,3r,1,1,2l,1l)
--diag_end(2l,1l,1,1,1r,0)--cycle; fi % left and right diagonals
penpos5(whatever,angle(z2-z1)); z5=whatever[z1,z2];
penpos6(whatever,angle(z3-z4)); z6=whatever[z3,z4]; y6=y5;
if hefty: y5r else: y5 fi =5/12y0;
y5r-y5l=y6r-y6l=cap_band; penstroke z5e--z6e; % bar line
if serifs: numeric inner_jut; pickup tiny.nib;
prime_points_inside(1,2); prime_points_inside(4,3);
if rt x1'r+cap_jut+.5u+1<=lft x4'l-cap_jut: inner_jut=cap_jut;
else: rt x1'r+inner_jut+.5u+1=lft x4'l-inner_jut; fi
dish_serif(1',2,a,1/2,outer_jut,b,.6,inner_jut)(dark); % left serif
dish_serif(4',3,c,1/2,inner_jut,d,1/3,outer_jut); fi % right serif
penlabels(0,1,2,3,4,5,6); endchar;

I think
the 'kerning info' about letter A and O 177 (dieresis)
may be added later from other sources. => I don't know yet.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:documentation status:blocked For things blocked on other things.
Projects
None yet
Development

No branches or pull requests

4 participants