Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MATH table #87

Merged
merged 15 commits into from
Sep 27, 2022
Merged

add MATH table #87

merged 15 commits into from
Sep 27, 2022

Conversation

ruifengx
Copy link
Contributor

@ruifengx ruifengx commented Feb 11, 2022

https://docs.microsoft.com/en-us/typography/opentype/spec/math

Changes

  • new feature opentype-math
  • new field math in FaceTables
  • new module tables::math:
    • math::Table for OpenType MATH tables;
    • MathConstants, MathGlyphInfo, and MathVariants subtables;
    • and more data types.

Unresolved Questions

I will read through your code for other tables for reference, and ponder on these questions for the next several days. Also, I will be very glad to hear your opinions and decisions.

  • Testing. The whole OpenType spec is not easy to comprehend in any sense, and it is very likely that the current version of the parser fails to handle some edge cases. I have been manually comparing the values parsed here and obtained from FontForge, but that is not efficient nor comprehensive.

  • Marking tables optional. Some subtables are "officially" optional (with a "May be NULL" note for the offset) and thus is already wrapped in Option. For some others, there is no "May be NULL" note, but it might still be desirable to make those tables optional. For example, in Latin Modern Math (which I used for testing), the MathKernInfo in MathGlyphInfo is missing, and apparently we don't want the lack of this single subtable fail the whole parsing.

  • Parsing API. Stream::read only support FromData types, I added a private convenience method Stream::parse_at_offset16 in tables::math (basically a combination of Stream::read_at_offset16 and FromSlice::parse). I wonder whether or not we should make this method part of the public API.

  • Lazy(Offset)Array{16,32} API. For MATH tables, the combination of a Coverage table and an array appear several times, and I would like to provide a universal CoverageTable interface for all of them. The problem is that the arrays are not necessarily representable with Lazy(Offset)Array16 (because the record types consist of both data fields and offset fields), and I had to invent new array types (MathValueArray for MathValueRecord and MathKernInfoArray for MathKernInfoRecord). Here I have two possible solutions:

    • Introduce an Array trait to abstract over these arrays.
    • Try to come up with a generic LazyOffsetArray, supporting complex record types (which may have zero or more data fields and zero or more offset fields) as array items.

    Both solutions will affect the public API (and the benefit may or may not worth the cost). So I think I should first discuss this with you before implementing the solutions.

  • Documentation. I am copy-pasting texts in the spec as doc comments for the table types and their fields. I am not entirely sure whether or not there would be some copyright problems here.

  • Code Formatting. I rely on IntelliJ Rust to format my code, and I noticed some slight differences with your original formatting. (I noticed this because I had to edit lib.rs to add new modules and new fields, and the formatter made some changes to your original code. Certainly, I did revert those changes and preserved the original formatting in lib.rs, but the same cannot be done to the newly added files.) I did not find this topic in the Contribution section in the project README, so I think it is better to make this explicit here.

  • Code Generation by External Script. I used a Haskell script to generate the field accessors and *_OFFSET constants for the MathConstants table. The script is not included in the source tree, but if you want it, I can provide a copy or simply commit it in the repo.

Going a bit off-topic

Using Option to represent parse error makes it really hard to figure out why parsing a table failed (especially when debugging the parser). Also, a field being None could mean either "this field does not appear in the table (possibly because of a NULL offset)" or "this field does appear in the table, but parsing the subtable failed". I don't have a solution (and I am even not sure if this really is a problem), but I thinks there is no harm to raise this topic here.

@RazrFalcon
Copy link
Owner

Thanks for the PR. I'm glad you have been able to grok ttf-parser in a couple of days.

And thanks for all the valid criticism in the comment. I will try to address all of it.

Testing

TrueType testing is a total pain. That's why we don't have that many tests. I'm working on it, but it takes forever.
We cannot use existing fonts because of licensing issues and frankly bloat. Moreover, existing fonts are rarely cover all the edge-cases, but they are also, well, valid. And we don't really care about valid fonts. Quite the contrary.
We could try generating our own fonts using fonttools, but it also pretty painful and we still would not be able to generate malformed fonts, which is what we need in the first place.

My current solution is to write table-specific binary data directly, by hand. Yes it's a bit verbose and confusing to newcomers, but I don't see any other way. This way tests are self-contained (everything inside one function, no need to look up some random files and so on), fast and easily versioned.

So for you to test the MATH table you would have to follow the current tests logic. Writing some basic MATH table should be relatively easy. Basically parsing in reverse. If you want, I can add MATH to the ttf-explorer so you could use some existing fonts as a reference. For starter, just a couple of sanity tests would be fine. No need to write tens and hundreds of them, unless you want to.

Marking tables optional

If the spec is not clear enough, I would suggest following the most fail-safe approach. ttf-parser should be able to retrieve as much data as possible.

Parsing API

Yes, OpenType tables was a new addition (specifically GDEF, GPOS and GSUB one) and Stream was designed for other tables, which are simpler and more sane.
OpenType tables are using a lot of reference to a reference to a reference data, which is hard to model in Rust in a safe manner. You can look at GPOS/GSUB implementation to see how I've tried to tame it. Something like gsub::AlternateSubstitution would be a good example. And LazyOffsetArray16 in particular.
I guess MATH uses the same logic.

I don't mind any additions to Stream as long as they are private.

Lazy(Offset)Array16,32 API

I think a custom type is fine. I've tried multiple times to come up with some generic solutions, but they end being way more complicated than needed. You just have to accept the fact that every table in TrueType has a unique structure and there is now way/point to unify it. Some verbosity would not hurt. That's why LazyArray16 and LazyArray32 are two separate types instead of being generic, because a generic version would be way more complicated.

Documentation

Yes, a short but your own comment is better than copy-pasting MS docs. I don't think they would mind, but better not too.

Also, ttf-parser docs should explain how ttf-parser works, not how TrueType works. We have a spec (multiple of them) for that.

Code Formatting

I personally don't like code formatters and therefore don't use them. At the same time I'm basically following the default rustfmt logic with some small exceptions. So for a new file you could use auto-formatting, but leave other files as is.

Code Generation by External Script

I think it's fine as is. I don't think we would be modifying it anytime soon. And if so, it would be easy to done by hand.

Going a bit off-topic

Yes, error processing is a very questionable area.

Just a quick note about Option, you're expected to use Option<Option<T>> for such cases. The outer one indicates an "error" and the inner one the actual optional value.

So the original ttf-parser version actually did have proper errors (you can find leftovers of them in CFFError). The problem is that in most cases the only possible error is a bug in the lib or a bug in a file. Which in 99% would be completely pointless to the caller. Like "failed to read data at offset 50". A caller can't do anything about it. There is no way to handle those errors.
Also, until recently, ttf-parser didn't have access to internal tables. So if a table parsing fails - it would simply be set to None and that's it (we should probably log it, but this crate is zero-dependency and it's an another long story).
Anyway, the point is that until recently having a proper error processing was a bit meaningless. At least in my opinion.

I do have plans on implementing a proper error processing, but still not sure in what way. Probably each table/module would have its own error type. There is no way we would be able to have a single/generic error.
Then, for people who care, they would be able to parse individual tables and get detailed errors. For others, those errors would simply be logged (using optional log dependency).

I guess my point here is that good error processing is only good for ttf-parser developers, me, for debugging. For end-user they are kinda useless, because malformed tables would be skipped anyway and there is no way to handle this. So at least printing a warning would be better that silently ignoring a table.

Cargo.toml Outdated Show resolved Hide resolved
src/tables/math.rs Outdated Show resolved Hide resolved
src/tables/math.rs Outdated Show resolved Hide resolved
src/tables/math.rs Outdated Show resolved Hide resolved
src/tables/math.rs Outdated Show resolved Hide resolved
/// to compensate for rounding errors and hinting corrections at a lower resolution. The
/// `min_connector_overlap` value tells how much overlap is necessary for this particular font.
pub min_connector_overlap: u16,
/// Offset to Coverage table, from the beginning of the MathVariants table.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an offset anymore. We're already resolved it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the documentation.

It is a bit repetitive in its current form. I plan to add a CoverageTable abstraction to group the Coverage and the array, because this way the data should appear better organised, and we can add some higher-level glyph-based API for looking up those arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a new type CoverageMapping for combining the Coverage table and the array for values. For the abstraction to work, I had to add a public trait Array in module parser, with two associated types Index and Value, and three public methods len, is_empty, and get. Unfortunately this is a breaking change (because these three are not inherent methods any more, and cannot be accessed without a use parser::Array;). Personally I think it is good to have a universal interface for all the arrays, but it is really up to you to decide. I hope this change is acceptable to you.

Relevant code:
https://github.com/ruifengx/ttf-parser/blob/math-table/src/parser.rs#L243-L255
https://github.com/ruifengx/ttf-parser/blob/math-table/src/tables/math.rs#L707-L715

src/tables/math.rs Outdated Show resolved Hide resolved
src/tables/math.rs Outdated Show resolved Hide resolved
src/tables/math.rs Show resolved Hide resolved
src/tables/math.rs Show resolved Hide resolved
@RazrFalcon
Copy link
Owner

Add some comments. I guess I would have to add MATH to ttf-explorer first, so I could understand/visualize this table for myself. For now, it's a bit hard to follow. It's not your fault, I just want to understand how this table is organized before merging.

Also, tests would help.

@ruifengx
Copy link
Contributor Author

Thanks for the detailed reply and comments. I'll start addressing these problems.

@laurmaedje
Copy link
Contributor

Hey, I'm very interested in the current state of this PR. As I happen to need the math table parsing, I would also be happy to help out with or finish up this PR.

@RazrFalcon
Copy link
Owner

I personally haven't looked into it much, but this table is pretty complex. And making a good API would be challenging.
I'm not sure if you can continue this PR or if you would have to make a new one.

@ruifengx any updates on this?

@ruifengx
Copy link
Contributor Author

This has indeed been a long time, and now I honestly don't remember the details clearly. But as I recall, the implementation should have covered all low-level pieces in the MATH table (though it still need proper testing and debugging). I see now this PR has merge conflicts, and that involves parser.rs, so the implementation would at least need a revision to adapt to the new parser interface. Other than that, I think working on this PR should probably be easier than starting from scratch, because most parts in the implementation (I mean, in math.rs) are just mechanical translation of the spec, which I believe is not a very entertaining task.

And I don't know if the following still applies:

I guess I would have to add MATH to ttf-explorer first, so I could understand/visualize this table for myself. For now, it's a bit hard to follow. It's not your fault, I just want to understand how this table is organized before merging.

If so, we will also need to work on ttf-explorer (which I believe is written in C++ with Qt).

@RazrFalcon
Copy link
Owner

@ruifengx Thanks!

Don't worry about ttf-explorer, that's on me.

@laurmaedje
Copy link
Contributor

I have had a bit closer look now and the PR really looks pretty complete already. To finish it up, I would thus:

  • Fork it
  • Fix the merge conflicts
  • Try it out a bit, making changes if necessary
  • Write some tests as this table is unfortunately not indirectly tested by rustybuzz. The table isn't even that complex structurally, but it is pretty big, so it will be hard to get any kind of good coverage with hand-written tables.

@RazrFalcon
Copy link
Owner

Can't you commit to this PR directly?

Yes, testing is the hardest part. There are not that many fonts that use this table to begin with. They are pretty rare.
Therefore I would accept this PR even without tests. As long as it works just for your use case.

@laurmaedje
Copy link
Contributor

I don't think so. I don't have write access to ruifengx's fork.

@ruifengx
Copy link
Contributor Author

ruifengx commented Sep 25, 2022

Can't you commit to this PR directly?

I don't think so. I don't have write access to ruifengx's fork.

@laurmaedje Do you prefer to commit directly to my fork? If that is easier/clearer, I could add you as a collaborator there.

EDIT: collaborator added.

@laurmaedje
Copy link
Contributor

That sounds good! Then, we can keep this PR and discussion.

@laurmaedje
Copy link
Contributor

@RazrFalcon ruifengx added an Array trait so that the math table implementation can use generics for a bit less duplication. This affects the public API. Do you want to keep it?

@RazrFalcon
Copy link
Owner

@laurmaedje

Do you want to keep it?

Not really. I would prefer as little traits/generics as possible.

@laurmaedje
Copy link
Contributor

This should be more or less ready now:

  • It's tested by hand with two math fonts and everything seems to work.
  • I replaced most copy-pasted docs with links or reworded them. The big exception are the doc comments on the constants. However, I removed the "suggested value" bits from the constants as these are intended for font designers, not developers.
  • I removed the Array trait and generics in favor of separate types.
  • In general, I have refactored the PR a bit to bring it a bit more in line with the remaining crate (struct order, naming, etc.)

@RazrFalcon
Copy link
Owner

Add MATH to readme and Cargo.toml opentype-layout comment and we're good.

Afaik, FreeType doesn't support MATH. Same with stb_truetype.

@RazrFalcon
Copy link
Owner

RazrFalcon commented Sep 27, 2022

By the way, the general rule in ttf-parser is that there are no Offset16/Offset32 and &[u8] in public API. Is it true for the new MATH table as well.
At a quick glance it is, but I want to be sure.

@laurmaedje
Copy link
Contributor

I also couldn't find anything math-related for FreeType and stb_truetype. Harfbuzz does support the table though. Regarding offsets: Yes, the new table does not expose offsets or binary data.

@RazrFalcon
Copy link
Owner

harfbuzz exposes MATH, but doesn't actually use it. It's true for multiple tables. The idea is that you can use harfbuzz as TrueType parser as well.
Thanks to Cargo we don't have to write monoliths.

@RazrFalcon RazrFalcon merged commit 88aaffa into RazrFalcon:master Sep 27, 2022
@RazrFalcon
Copy link
Owner

Thanks to @ruifengx and @laurmaedje !

I will fix CFF parsing soon and will publish a new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants