Check glyph hashes more thoroughly #814

jenskutilek · 2024-01-23T15:40:06Z

To determine if the TT instructions were still valid, the stored glyph hash was only checked against the computed hash of the TTFont glyph.

This PR changes this so there are two checks:

The stored glyph hash is checked against a computed UFO glyph hash to see if the instructions match the UFO glyph.
The computed UFO glyph hash is checked against a computed hash of the TTFont glyph to see if the UFO glyph matches the TTF glyph.

In the case of composite glyphs, the original code would fail for any fractional transformations that cannot be expressed exactly in F2Dot14. This PR fixes that by using the new transformRoundFunc parameter of the RoundingPointPen (fonttools/fonttools#3426). Thus it depends on a yet unreleased version of FontTools which contains the addition.

EDIT(13/05/2024):

just check the stored glyph hash against the hash of the TTGlyph that is being built.
Also, the hash calculation rounds the component transforms to match the float precision

anthrotype · 2024-01-23T15:59:25Z

before suggesting a 'fix' can you please elaborate a bit what issue are you trying to address? What's wrong with the current code

jenskutilek · 2024-01-23T16:25:15Z

In the previous version, it would be possible that the stored glyph hash doesn't match the current UFO glyph, which wasn't checked. Still the instructions would be written to the TTF, if the stored glyph hash matched the TTF glyph.

That could be exploited in creative ways, e.g. copying the instructions and glyph hash from a quadratic UFO to a cubic UFO, then building a TTF from that. If the cubic to quadratic conversion yielded the same resulting outline, the instructions would be used, even though they have no relation to the glyph in the cubic UFO.

It's perhaps more of a theoretical issue, but that possibility seems a bit hacky.

I didn't encounter anything like this in a real scenario. It just occurred to me that the check is not as safe as it could be.

jenskutilek · 2024-01-23T16:27:26Z

The more important part is the fix using the F2Dot14 rounding in checking the glyph hash against the TTF glyph hash. That has real-world implications, I just didn't hit them before as I rarely deal with scaled components in hinted composites.

To check the identity with F2Dot14 rounding, the stored hash can't be used, as it didn't apply the rounding. So for this scenario, two computed hashes would be compared, and the stored hash would be ignored altogether.

That's why I added the additional check of stored vs. computed UFO hash.

anthrotype · 2024-01-23T16:34:23Z

it would be possible that the stored glyph hash doesn't match the current UFO glyph

but should it be ufo2ft's responsibility to check for that? Shouldn't it be a separate tool, e.g. the one responsible for creating and storing those hashes, that needs to ensure the stored hashes match the UFO glyphs? Otherwise why bother storing them to begin with if one can just compute them? I'm confused.

jenskutilek · 2024-01-23T16:43:31Z

In the case of temporary hinted UFOs, as occur in our workflow, it can be assumed that the TT assembly belongs to the current glyph.

But if a UFO with compiled hinting is further edited, and the editor doesn't remove the compiled hinting, it may have unexpected results if a font is generated from the UFO with ufo2ft, as ufo2ft would store the assembly as bytecode in the TTF on non-matching outlines if the hash is not checked.

anthrotype · 2024-01-23T16:45:18Z

But if a UFO with compiled hinting is further edited, and the editor doesn't remove the compiled hinting

I totally understand that, but i would argue it is their responsibility (not ours) and 'gargbage in, garbage out'

jenskutilek · 2024-01-23T16:48:41Z

The UFO spec says:

Hash of glyph outlines which may have been processed by authoring tools. This is computed when the instructions dict is created or modified. It is used to determine if the glyph outlines have changed since the glyph was hinted: if it has, then the instructions for the glyph should not be used by authoring tools.

I assumed that the last "authoring tool" here refers to tools like ufo2ft, isn't that the case?

anthrotype · 2024-01-23T16:53:20Z

exactly, I read that as "if it (the hash) has (changed), then the instructions for the glyph should not be used"; hence if the hash has not changed, the compiler can assume that the instructions can be used.
The hash is created by the hinting editor, not by the compiler. The compiler uses the hash to quickly determine if the instructions are still valid or not. It is not the responsibility of the compiler to check the hash matches the glyph outlines, but it is the tool that authored the hiting sources that has to ensure the hash is up to date.
This is my reading, I may be getting it wrong.

anthrotype · 2024-01-23T16:58:58Z

don't you also feel like a bit weird that on the one hand we read the hashes, but if they don't match, we just discard them and compute them again in order to compare? why bothering to even store them to begin with then?

anthrotype · 2024-01-23T17:03:27Z

I'm entirely familiar with the workflow here, but I assume that when the hashes are stored, they get computed from the compiled and hinted TTF glyphs, not from the UFO glyphs; if that's the case, then no rounding is necessary because the TTFont should already be rounded.

anthrotype · 2024-01-23T17:05:18Z

Or, if these hashes are supposed to be computed on the source glyphs, then ufo2ft should not attempt to compare them against the TTFont's (rounded) glyphs, but against the UFO's (unrounded, original) glyph outlines...

jenskutilek · 2024-01-23T17:10:29Z

don't you also feel like a bit weird that on the one hand we read the hashes, but if they don't match, we just discard them and compute them again in order to compare? why bothering to even store them to begin with then?

That's not what I do. I calculate the UFO glyph hash, and if it doesn't match the stored hash, I discard the hinting assembly, not the hash.

In the second step, if the first check succeeded, I check if the UFO glyph matches the TTF glyph, for which I calculate both hashes.

jenskutilek · 2024-01-23T17:14:52Z

Or, if these hashes are supposed to be computed on the source glyphs, then ufo2ft should not attempt to compare them against the TTFont's (rounded) glyphs, but against the UFO's (unrounded, original) glyph outlines...

That would go wrong e.g. if I forgot to specify reverseDirection=False ... it would be easy to miss that the whole bytecode would have been applied to the outlines backwards ...

jenskutilek · 2024-01-23T17:17:33Z

but I assume that when the hashes are stored, they get computed from the compiled and hinted TTF glyphs, not from the UFO glyphs; if that's the case, then no rounding is necessary because the TTFont should already be rounded.

In our case, the hashes are computed from the (quadratic) UFO glyphs, because the TTX assembly is generated from high-level FontLab hint commands (as exported by vfb2ufo/vfb3ufo). A TTF doesn't exist yet when the hashes are computed.

anthrotype · 2024-01-23T17:19:26Z

so you're saying the hashes are supposed to be computed on the compiled TTGlyph objects of the TTF that has been hinted, not on the original UFO glyph outlines (the ufo spec doesn't actually say that, maybe it should).
If that is the case, then I don't see how calculating the UFO glyph hash helps in any way.

anthrotype · 2024-01-23T17:20:28Z

the hashes are computed from the (quadratic) UFO glyphs

oh now there is a third one which is neither the original UFO glyphs nor the final compiled TTF glyphs but something in between..

anthrotype · 2024-01-23T17:27:22Z

if the stored hash is supposed to be computed from the quadratic UFO glyphs, then these are effectively the sources and ufo2ft is not supposed to touch their outlines in any way (apart from the necessary rounding) and thus reverseDirection ought to be False if you're feeding ufo2ft a hinted, quadratic UFO with stored hashes; same for any other outline-editing filter.

jenskutilek · 2024-01-23T17:29:18Z

the hashes are computed from the (quadratic) UFO glyphs

oh now there is a third one which is neither the original UFO glyphs nor the final compiled TTF glyphs but something in between..

No, there is nothing inbetween. Our sources are hinted quadratic VFBs, which are converted by vfb3ufo to quadratic UFOs, then the hinting is compiled (stored as TTX assembly) inside these quadratic UFOs. The quadratic UFO glyphs are used to calculate and store the hashes.

Then we use fontmake/ufo2ft to build TTFs from the UFOs, and of course ufo2ft's InstructionCompiler should pick up the instructions and store the bytecode in the TTF.

But it should somehow be noticeable if ufo2ft changed the TTF outlines and the instructions don't match anymore, e.g. by applying filters in the TTFPreProcessor. It would be a user error, but still ...

anthrotype · 2024-01-23T17:33:34Z

it should somehow be noticeable if ufo2ft changed the TTF outlines and the instructions don't match anymore, e.g. by applying filters in the TTFPreProcessor

but I think this is already the case in the current code, no? Before this PR, we are checking the computed TTGlyph hash against the stored hash, and if they don't match we don't copy the instructions. The problem is only if/when the stored hash is outdated.

jenskutilek · 2024-01-23T17:53:07Z

but I think this is already the case in the current code, no? Before this PR, we are checking the computed TTGlyph hash against the stored hash, and if they don't match we don't copy the instructions. The problem is only if/when the stored hash is outdated.

The stored hash vs. computed TTGlyph hash fails for all composites with scales, that's how all this trouble got started :)

anthrotype · 2024-01-23T18:06:18Z

then should you not simply change the way the stored hash is computed? you do the rounding before computing the hash of the quadratic UFO glyphs so that by the time ufo2ft compares the stored hash against the TTGlyph hash, they will match.

And the UFO spec must be updated to clarify all this.

anthrotype · 2024-01-23T18:22:39Z

the stored hash is computed on the UFO glyph outlines at the time the instructions where generated; if the UFO glyph outlines change afterwards, then the instructions can be presumed to be no longer valid and should be ignored. This is signalled by the computed hash of the UFO glyph outlines comparing different from the respective hash as previously stored in the UFO lib at the time the UFO was hinted.
If this is all true, and please correct me if I am wrong, then ufo2ft should only compute the hash on the UFO glyphs and compare with the hash as stored in the UFO lib, and if these don't match, then discard the instructions therein and leave the glyph unhinted; otherwise it should use the instructions. In this scenario and workflow, no rounding whatsoever should be needed when computing the hash. Right now ufo2ft is comparing the stored hash (computed from UFO glyphs) to the hash on the TTGlyph, and this is wrong and should be fixed.

But then you argue that, "what if the user passes some option that may modify the outline e.g. reverseDirection?".
Maybe we could detect that upfront and pass on this info to the instruction compiler somehow.

e.g. copying the instructions and glyph hash from a quadratic UFO to a cubic UFO?"

Do we actually want to support this use case?

anthrotype · 2024-01-23T18:47:55Z

So currently, before this PR, we were computing the hash only once (on the TTGlyph) and compare it with the stored hash values in the UFO lib. Now with this PR we are computing it three times: once on the (unrounded) UFO source glyphs (to be compared with the stored hashes), and then again on the rounded UFO glyphs to compare with the equally rounded TTGlyph hash (because we don't trust ourselves nor the user that something may have happened between the input UFO glyph outlines and the final compiled TTGlyphs).
It feels we are doing a lot of work.. If only this stored hash would be computed in a way that takes into account the final desired state of the glyphs as they are compiled to TTGlyphs (i.e. by rounding coordinates and transform), then ufo2ft could merely compute the hash of the latter (as it's currently already doing) and when the stored hash doesn't match the computed TTGlyph hash then it can safely assume the instructions are correct (we don't care at that stage whether or not the hash of UFO glyph outlines actually matches the hash as stored in the UFO, it ought to do if the hinting tool has generated it correctly).

jenskutilek · 2024-05-08T16:45:13Z

Finally coming back to this ... @anthrotype you're right, the various checks are too excessive. I've done as you suggested and just check the stored glyph hash against the hash of the TTGlyph that is being built.

Also, the hash calculation rounds the component transforms to match the float precision.

jenskutilek added 4 commits January 23, 2024 11:01

Compare UFO to TTF glyph using a rounding function for component scales

959369b

Combine HashPointPen with RoundingPointPen

ecfe765

More thorough glyph hash checking

4a38f40

Update method call in tests

9dbd5f0

Sort imports

cbe0774

jenskutilek added 2 commits May 8, 2024 17:25

Only compare stored hash against calculated tt glyph hash

1926bb6

Merge branch 'main' into component-hash-ttf

8045c97

jenskutilek added 4 commits May 8, 2024 17:26

We don't need this

d2c1bc6

Fix tests

e47a89c

Lint

78c0a8e

Lint again

64bc4ed

Update comments

7716a37

anthrotype approved these changes May 13, 2024

View reviewed changes

anthrotype merged commit 779bbad into main May 13, 2024
9 checks passed

khaledhosny deleted the component-hash-ttf branch May 15, 2024 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check glyph hashes more thoroughly #814

Check glyph hashes more thoroughly #814

jenskutilek commented Jan 23, 2024 •

edited by anthrotype

Loading

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024 •

edited

Loading

anthrotype commented Jan 23, 2024 •

edited

Loading

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented May 8, 2024

Check glyph hashes more thoroughly #814

Check glyph hashes more thoroughly #814

Conversation

jenskutilek commented Jan 23, 2024 • edited by anthrotype Loading

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024 • edited Loading

anthrotype commented Jan 23, 2024 • edited Loading

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

anthrotype commented Jan 23, 2024

jenskutilek commented May 8, 2024

jenskutilek commented Jan 23, 2024 •

edited by anthrotype

Loading

jenskutilek commented Jan 23, 2024 •

edited

Loading

anthrotype commented Jan 23, 2024 •

edited

Loading