add `MATH` table #87

ruifengx · 2022-02-11T12:01:46Z

https://docs.microsoft.com/en-us/typography/opentype/spec/math

Changes

new feature opentype-math
new field math in FaceTables
new module tables::math:
- math::Table for OpenType MATH tables;
- MathConstants, MathGlyphInfo, and MathVariants subtables;
- and more data types.

Unresolved Questions

I will read through your code for other tables for reference, and ponder on these questions for the next several days. Also, I will be very glad to hear your opinions and decisions.

Testing. The whole OpenType spec is not easy to comprehend in any sense, and it is very likely that the current version of the parser fails to handle some edge cases. I have been manually comparing the values parsed here and obtained from FontForge, but that is not efficient nor comprehensive.
Marking tables optional. Some subtables are "officially" optional (with a "May be NULL" note for the offset) and thus is already wrapped in Option. For some others, there is no "May be NULL" note, but it might still be desirable to make those tables optional. For example, in Latin Modern Math (which I used for testing), the MathKernInfo in MathGlyphInfo is missing, and apparently we don't want the lack of this single subtable fail the whole parsing.
Parsing API. Stream::read only support FromData types, I added a private convenience method Stream::parse_at_offset16 in tables::math (basically a combination of Stream::read_at_offset16 and FromSlice::parse). I wonder whether or not we should make this method part of the public API.
Lazy(Offset)Array{16,32} API. For MATH tables, the combination of a Coverage table and an array appear several times, and I would like to provide a universal CoverageTable interface for all of them. The problem is that the arrays are not necessarily representable with Lazy(Offset)Array16 (because the record types consist of both data fields and offset fields), and I had to invent new array types (MathValueArray for MathValueRecord and MathKernInfoArray for MathKernInfoRecord). Here I have two possible solutions:
- Introduce an Array trait to abstract over these arrays.
- Try to come up with a generic LazyOffsetArray, supporting complex record types (which may have zero or more data fields and zero or more offset fields) as array items.
Both solutions will affect the public API (and the benefit may or may not worth the cost). So I think I should first discuss this with you before implementing the solutions.
Documentation. I am copy-pasting texts in the spec as doc comments for the table types and their fields. I am not entirely sure whether or not there would be some copyright problems here.
Code Formatting. I rely on IntelliJ Rust to format my code, and I noticed some slight differences with your original formatting. (I noticed this because I had to edit lib.rs to add new modules and new fields, and the formatter made some changes to your original code. Certainly, I did revert those changes and preserved the original formatting in lib.rs, but the same cannot be done to the newly added files.) I did not find this topic in the Contribution section in the project README, so I think it is better to make this explicit here.
Code Generation by External Script. I used a Haskell script to generate the field accessors and *_OFFSET constants for the MathConstants table. The script is not included in the source tree, but if you want it, I can provide a copy or simply commit it in the repo.

Going a bit off-topic

Using Option to represent parse error makes it really hard to figure out why parsing a table failed (especially when debugging the parser). Also, a field being None could mean either "this field does not appear in the table (possibly because of a NULL offset)" or "this field does appear in the table, but parsing the subtable failed". I don't have a solution (and I am even not sure if this really is a problem), but I thinks there is no harm to raise this topic here.

RazrFalcon · 2022-02-11T18:14:13Z

Thanks for the PR. I'm glad you have been able to grok ttf-parser in a couple of days.

And thanks for all the valid criticism in the comment. I will try to address all of it.

Testing

TrueType testing is a total pain. That's why we don't have that many tests. I'm working on it, but it takes forever.
We cannot use existing fonts because of licensing issues and frankly bloat. Moreover, existing fonts are rarely cover all the edge-cases, but they are also, well, valid. And we don't really care about valid fonts. Quite the contrary.
We could try generating our own fonts using fonttools, but it also pretty painful and we still would not be able to generate malformed fonts, which is what we need in the first place.

My current solution is to write table-specific binary data directly, by hand. Yes it's a bit verbose and confusing to newcomers, but I don't see any other way. This way tests are self-contained (everything inside one function, no need to look up some random files and so on), fast and easily versioned.

So for you to test the MATH table you would have to follow the current tests logic. Writing some basic MATH table should be relatively easy. Basically parsing in reverse. If you want, I can add MATH to the ttf-explorer so you could use some existing fonts as a reference. For starter, just a couple of sanity tests would be fine. No need to write tens and hundreds of them, unless you want to.

Marking tables optional

If the spec is not clear enough, I would suggest following the most fail-safe approach. ttf-parser should be able to retrieve as much data as possible.

Parsing API

Yes, OpenType tables was a new addition (specifically GDEF, GPOS and GSUB one) and Stream was designed for other tables, which are simpler and more sane.
OpenType tables are using a lot of reference to a reference to a reference data, which is hard to model in Rust in a safe manner. You can look at GPOS/GSUB implementation to see how I've tried to tame it. Something like gsub::AlternateSubstitution would be a good example. And LazyOffsetArray16 in particular.
I guess MATH uses the same logic.

I don't mind any additions to Stream as long as they are private.

Lazy(Offset)Array16,32 API

I think a custom type is fine. I've tried multiple times to come up with some generic solutions, but they end being way more complicated than needed. You just have to accept the fact that every table in TrueType has a unique structure and there is now way/point to unify it. Some verbosity would not hurt. That's why LazyArray16 and LazyArray32 are two separate types instead of being generic, because a generic version would be way more complicated.

Documentation

Yes, a short but your own comment is better than copy-pasting MS docs. I don't think they would mind, but better not too.

Also, ttf-parser docs should explain how ttf-parser works, not how TrueType works. We have a spec (multiple of them) for that.

Code Formatting

I personally don't like code formatters and therefore don't use them. At the same time I'm basically following the default rustfmt logic with some small exceptions. So for a new file you could use auto-formatting, but leave other files as is.

Code Generation by External Script

I think it's fine as is. I don't think we would be modifying it anytime soon. And if so, it would be easy to done by hand.

Going a bit off-topic

Yes, error processing is a very questionable area.

Just a quick note about Option, you're expected to use Option<Option<T>> for such cases. The outer one indicates an "error" and the inner one the actual optional value.

So the original ttf-parser version actually did have proper errors (you can find leftovers of them in CFFError). The problem is that in most cases the only possible error is a bug in the lib or a bug in a file. Which in 99% would be completely pointless to the caller. Like "failed to read data at offset 50". A caller can't do anything about it. There is no way to handle those errors.
Also, until recently, ttf-parser didn't have access to internal tables. So if a table parsing fails - it would simply be set to None and that's it (we should probably log it, but this crate is zero-dependency and it's an another long story).
Anyway, the point is that until recently having a proper error processing was a bit meaningless. At least in my opinion.

I do have plans on implementing a proper error processing, but still not sure in what way. Probably each table/module would have its own error type. There is no way we would be able to have a single/generic error.
Then, for people who care, they would be able to parse individual tables and get detailed errors. For others, those errors would simply be logged (using optional log dependency).

I guess my point here is that good error processing is only good for ttf-parser developers, me, for debugging. For end-user they are kinda useless, because malformed tables would be skipped anyway and there is no way to handle this. So at least printing a warning would be better that silently ignoring a table.

Cargo.toml

src/tables/math.rs

RazrFalcon · 2022-02-11T18:38:23Z

src/tables/math.rs

+ /// to compensate for rounding errors and hinting corrections at a lower resolution. The
+ /// `min_connector_overlap` value tells how much overlap is necessary for this particular font.
+ pub min_connector_overlap: u16,
+ /// Offset to Coverage table, from the beginning of the MathVariants table.


It's not an offset anymore. We're already resolved it.

Changed the documentation.

It is a bit repetitive in its current form. I plan to add a CoverageTable abstraction to group the Coverage and the array, because this way the data should appear better organised, and we can add some higher-level glyph-based API for looking up those arrays.

I introduced a new type CoverageMapping for combining the Coverage table and the array for values. For the abstraction to work, I had to add a public trait Array in module parser, with two associated types Index and Value, and three public methods len, is_empty, and get. Unfortunately this is a breaking change (because these three are not inherent methods any more, and cannot be accessed without a use parser::Array;). Personally I think it is good to have a universal interface for all the arrays, but it is really up to you to decide. I hope this change is acceptable to you.

Relevant code:
https://github.com/ruifengx/ttf-parser/blob/math-table/src/parser.rs#L243-L255
https://github.com/ruifengx/ttf-parser/blob/math-table/src/tables/math.rs#L707-L715

src/tables/math.rs

RazrFalcon · 2022-02-11T18:44:46Z

Add some comments. I guess I would have to add MATH to ttf-explorer first, so I could understand/visualize this table for myself. For now, it's a bit hard to follow. It's not your fault, I just want to understand how this table is organized before merging.

Also, tests would help.

ruifengx · 2022-02-12T08:08:50Z

Thanks for the detailed reply and comments. I'll start addressing these problems.

The `part_flags` field was missing, and now it is added, so we should use correct size for this record.

laurmaedje · 2022-09-23T14:51:30Z

Hey, I'm very interested in the current state of this PR. As I happen to need the math table parsing, I would also be happy to help out with or finish up this PR.

RazrFalcon · 2022-09-24T09:04:22Z

I personally haven't looked into it much, but this table is pretty complex. And making a good API would be challenging.
I'm not sure if you can continue this PR or if you would have to make a new one.

@ruifengx any updates on this?

ruifengx · 2022-09-24T09:52:45Z

This has indeed been a long time, and now I honestly don't remember the details clearly. But as I recall, the implementation should have covered all low-level pieces in the MATH table (though it still need proper testing and debugging). I see now this PR has merge conflicts, and that involves parser.rs, so the implementation would at least need a revision to adapt to the new parser interface. Other than that, I think working on this PR should probably be easier than starting from scratch, because most parts in the implementation (I mean, in math.rs) are just mechanical translation of the spec, which I believe is not a very entertaining task.

And I don't know if the following still applies:

I guess I would have to add MATH to ttf-explorer first, so I could understand/visualize this table for myself. For now, it's a bit hard to follow. It's not your fault, I just want to understand how this table is organized before merging.

If so, we will also need to work on ttf-explorer (which I believe is written in C++ with Qt).

RazrFalcon · 2022-09-24T09:55:58Z

@ruifengx Thanks!

Don't worry about ttf-explorer, that's on me.

laurmaedje · 2022-09-25T10:02:10Z

I have had a bit closer look now and the PR really looks pretty complete already. To finish it up, I would thus:

Fork it
Fix the merge conflicts
Try it out a bit, making changes if necessary
Write some tests as this table is unfortunately not indirectly tested by rustybuzz. The table isn't even that complex structurally, but it is pretty big, so it will be hard to get any kind of good coverage with hand-written tables.

RazrFalcon · 2022-09-25T10:09:56Z

Can't you commit to this PR directly?

Yes, testing is the hardest part. There are not that many fonts that use this table to begin with. They are pretty rare.
Therefore I would accept this PR even without tests. As long as it works just for your use case.

laurmaedje · 2022-09-25T10:15:36Z

I don't think so. I don't have write access to ruifengx's fork.

ruifengx · 2022-09-25T11:42:07Z

Can't you commit to this PR directly?

I don't think so. I don't have write access to ruifengx's fork.

@laurmaedje Do you prefer to commit directly to my fork? If that is easier/clearer, I could add you as a collaborator there.

EDIT: collaborator added.

laurmaedje · 2022-09-25T11:44:20Z

That sounds good! Then, we can keep this PR and discussion.

laurmaedje · 2022-09-26T15:39:40Z

@RazrFalcon ruifengx added an Array trait so that the math table implementation can use generics for a bit less duplication. This affects the public API. Do you want to keep it?

RazrFalcon · 2022-09-26T16:36:38Z

@laurmaedje

Do you want to keep it?

Not really. I would prefer as little traits/generics as possible.

laurmaedje · 2022-09-27T13:10:00Z

This should be more or less ready now:

It's tested by hand with two math fonts and everything seems to work.
I replaced most copy-pasted docs with links or reworded them. The big exception are the doc comments on the constants. However, I removed the "suggested value" bits from the constants as these are intended for font designers, not developers.
I removed the Array trait and generics in favor of separate types.
In general, I have refactored the PR a bit to bring it a bit more in line with the remaining crate (struct order, naming, etc.)

RazrFalcon · 2022-09-27T13:46:34Z

Add MATH to readme and Cargo.toml opentype-layout comment and we're good.

Afaik, FreeType doesn't support MATH. Same with stb_truetype.

RazrFalcon · 2022-09-27T13:48:12Z

By the way, the general rule in ttf-parser is that there are no Offset16/Offset32 and &[u8] in public API. Is it true for the new MATH table as well.
At a quick glance it is, but I want to be sure.

laurmaedje · 2022-09-27T14:11:46Z

I also couldn't find anything math-related for FreeType and stb_truetype. Harfbuzz does support the table though. Regarding offsets: Yes, the new table does not expose offsets or binary data.

RazrFalcon · 2022-09-27T16:12:29Z

harfbuzz exposes MATH, but doesn't actually use it. It's true for multiple tables. The idea is that you can use harfbuzz as TrueType parser as well.
Thanks to Cargo we don't have to write monoliths.

RazrFalcon · 2022-09-27T16:13:38Z

Thanks to @ruifengx and @laurmaedje !

I will fix CFF parsing soon and will publish a new version.

ruifengx added 5 commits February 11, 2022 18:44

add MathConstants table (part of MATH table)

9f3420a

add MathGlyphInfo table (part of MATH table)

3c2d87c

add MathVariants table (part of MATH table)

8f36815

refactor MATH parsing with LazyArray16

f323641

factor out Stream::parse_at_offset16

01db16d

RazrFalcon requested changes Feb 11, 2022

View reviewed changes

ruifengx added 7 commits February 13, 2022 14:39

merge feature "opentype-math" into "opentype-layout"

e5352ea

revise Debug impls, documentation, and formatting

e2b6581

mark Stream::{read, parse_at_offset16} calls with target type

4e5904d

fix the size of GlyphPartRecord

1a0dfb9

The `part_flags` field was missing, and now it is added, so we should use correct size for this record.

replace documentations copy-pasted from the spec

ab65ae0

add trait parser::Array {len, is_empty, get}

3c7070b

factor out math::CoverageMapping

6db5d9b

Merge master into math-table

ab4ad9f

RazrFalcon approved these changes Sep 26, 2022

View reviewed changes

laurmaedje force-pushed the math-table branch from 287e5bb to aab6bc0 Compare September 27, 2022 12:53

Refactor math table.

8a68791

laurmaedje force-pushed the math-table branch from aab6bc0 to 8a68791 Compare September 27, 2022 13:07

Add math table to documentation

15ea7dd

RazrFalcon merged commit 88aaffa into RazrFalcon:master Sep 27, 2022

inferiorhumanorgans mentioned this pull request Aug 12, 2024

Add support for the STAT table. #166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `MATH` table #87

add `MATH` table #87

ruifengx commented Feb 11, 2022 •

edited

Loading

RazrFalcon commented Feb 11, 2022

RazrFalcon Feb 11, 2022

ruifengx Feb 13, 2022

ruifengx Feb 13, 2022

RazrFalcon commented Feb 11, 2022

ruifengx commented Feb 12, 2022

laurmaedje commented Sep 23, 2022

RazrFalcon commented Sep 24, 2022

ruifengx commented Sep 24, 2022

RazrFalcon commented Sep 24, 2022

laurmaedje commented Sep 25, 2022

RazrFalcon commented Sep 25, 2022

laurmaedje commented Sep 25, 2022

ruifengx commented Sep 25, 2022 •

edited

Loading

laurmaedje commented Sep 25, 2022

laurmaedje commented Sep 26, 2022

RazrFalcon commented Sep 26, 2022

laurmaedje commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022 •

edited

Loading

laurmaedje commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

add MATH table #87

add MATH table #87

Conversation

ruifengx commented Feb 11, 2022 • edited Loading

Changes

Unresolved Questions

Going a bit off-topic

RazrFalcon commented Feb 11, 2022

Testing

Marking tables optional

Parsing API

Lazy(Offset)Array16,32 API

Documentation

Code Formatting

Code Generation by External Script

Going a bit off-topic

RazrFalcon Feb 11, 2022

Choose a reason for hiding this comment

ruifengx Feb 13, 2022

Choose a reason for hiding this comment

ruifengx Feb 13, 2022

Choose a reason for hiding this comment

RazrFalcon commented Feb 11, 2022

ruifengx commented Feb 12, 2022

laurmaedje commented Sep 23, 2022

RazrFalcon commented Sep 24, 2022

ruifengx commented Sep 24, 2022

RazrFalcon commented Sep 24, 2022

laurmaedje commented Sep 25, 2022

RazrFalcon commented Sep 25, 2022

laurmaedje commented Sep 25, 2022

ruifengx commented Sep 25, 2022 • edited Loading

laurmaedje commented Sep 25, 2022

laurmaedje commented Sep 26, 2022

RazrFalcon commented Sep 26, 2022

laurmaedje commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022 • edited Loading

laurmaedje commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

RazrFalcon commented Sep 27, 2022

add `MATH` table #87

add `MATH` table #87

ruifengx commented Feb 11, 2022 •

edited

Loading

ruifengx commented Sep 25, 2022 •

edited

Loading

RazrFalcon commented Sep 27, 2022 •

edited

Loading