Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check glyph hashes more thoroughly #814

Merged
merged 12 commits into from
May 13, 2024
Merged

Check glyph hashes more thoroughly #814

merged 12 commits into from
May 13, 2024

Conversation

jenskutilek
Copy link
Collaborator

@jenskutilek jenskutilek commented Jan 23, 2024

To determine if the TT instructions were still valid, the stored glyph hash was only checked against the computed hash of the TTFont glyph.

This PR changes this so there are two checks:

  1. The stored glyph hash is checked against a computed UFO glyph hash to see if the instructions match the UFO glyph.
  2. The computed UFO glyph hash is checked against a computed hash of the TTFont glyph to see if the UFO glyph matches the TTF glyph.

In the case of composite glyphs, the original code would fail for any fractional transformations that cannot be expressed exactly in F2Dot14. This PR fixes that by using the new transformRoundFunc parameter of the RoundingPointPen (fonttools/fonttools#3426). Thus it depends on a yet unreleased version of FontTools which contains the addition.

EDIT(13/05/2024):

just check the stored glyph hash against the hash of the TTGlyph that is being built.
Also, the hash calculation rounds the component transforms to match the float precision

@anthrotype
Copy link
Member

before suggesting a 'fix' can you please elaborate a bit what issue are you trying to address? What's wrong with the current code

@jenskutilek
Copy link
Collaborator Author

In the previous version, it would be possible that the stored glyph hash doesn't match the current UFO glyph, which wasn't checked. Still the instructions would be written to the TTF, if the stored glyph hash matched the TTF glyph.

That could be exploited in creative ways, e.g. copying the instructions and glyph hash from a quadratic UFO to a cubic UFO, then building a TTF from that. If the cubic to quadratic conversion yielded the same resulting outline, the instructions would be used, even though they have no relation to the glyph in the cubic UFO.

It's perhaps more of a theoretical issue, but that possibility seems a bit hacky.

I didn't encounter anything like this in a real scenario. It just occurred to me that the check is not as safe as it could be.

@jenskutilek
Copy link
Collaborator Author

jenskutilek commented Jan 23, 2024

The more important part is the fix using the F2Dot14 rounding in checking the glyph hash against the TTF glyph hash. That has real-world implications, I just didn't hit them before as I rarely deal with scaled components in hinted composites.

To check the identity with F2Dot14 rounding, the stored hash can't be used, as it didn't apply the rounding. So for this scenario, two computed hashes would be compared, and the stored hash would be ignored altogether.

That's why I added the additional check of stored vs. computed UFO hash.

@anthrotype
Copy link
Member

anthrotype commented Jan 23, 2024

it would be possible that the stored glyph hash doesn't match the current UFO glyph

but should it be ufo2ft's responsibility to check for that? Shouldn't it be a separate tool, e.g. the one responsible for creating and storing those hashes, that needs to ensure the stored hashes match the UFO glyphs? Otherwise why bother storing them to begin with if one can just compute them? I'm confused.

@jenskutilek
Copy link
Collaborator Author

In the case of temporary hinted UFOs, as occur in our workflow, it can be assumed that the TT assembly belongs to the current glyph.

But if a UFO with compiled hinting is further edited, and the editor doesn't remove the compiled hinting, it may have unexpected results if a font is generated from the UFO with ufo2ft, as ufo2ft would store the assembly as bytecode in the TTF on non-matching outlines if the hash is not checked.

@anthrotype
Copy link
Member

But if a UFO with compiled hinting is further edited, and the editor doesn't remove the compiled hinting

I totally understand that, but i would argue it is their responsibility (not ours) and 'gargbage in, garbage out'

@jenskutilek
Copy link
Collaborator Author

The UFO spec says:

Hash of glyph outlines which may have been processed by authoring tools. This is computed when the instructions dict is created or modified. It is used to determine if the glyph outlines have changed since the glyph was hinted: if it has, then the instructions for the glyph should not be used by authoring tools.

I assumed that the last "authoring tool" here refers to tools like ufo2ft, isn't that the case?

@anthrotype
Copy link
Member

exactly, I read that as "if it (the hash) has (changed), then the instructions for the glyph should not be used"; hence if the hash has not changed, the compiler can assume that the instructions can be used.
The hash is created by the hinting editor, not by the compiler. The compiler uses the hash to quickly determine if the instructions are still valid or not. It is not the responsibility of the compiler to check the hash matches the glyph outlines, but it is the tool that authored the hiting sources that has to ensure the hash is up to date.
This is my reading, I may be getting it wrong.

@anthrotype
Copy link
Member

don't you also feel like a bit weird that on the one hand we read the hashes, but if they don't match, we just discard them and compute them again in order to compare? why bothering to even store them to begin with then?

@anthrotype
Copy link
Member

I'm entirely familiar with the workflow here, but I assume that when the hashes are stored, they get computed from the compiled and hinted TTF glyphs, not from the UFO glyphs; if that's the case, then no rounding is necessary because the TTFont should already be rounded.

@anthrotype
Copy link
Member

Or, if these hashes are supposed to be computed on the source glyphs, then ufo2ft should not attempt to compare them against the TTFont's (rounded) glyphs, but against the UFO's (unrounded, original) glyph outlines...

@jenskutilek
Copy link
Collaborator Author

don't you also feel like a bit weird that on the one hand we read the hashes, but if they don't match, we just discard them and compute them again in order to compare? why bothering to even store them to begin with then?

That's not what I do. I calculate the UFO glyph hash, and if it doesn't match the stored hash, I discard the hinting assembly, not the hash.

In the second step, if the first check succeeded, I check if the UFO glyph matches the TTF glyph, for which I calculate both hashes.

@jenskutilek
Copy link
Collaborator Author

Or, if these hashes are supposed to be computed on the source glyphs, then ufo2ft should not attempt to compare them against the TTFont's (rounded) glyphs, but against the UFO's (unrounded, original) glyph outlines...

That would go wrong e.g. if I forgot to specify reverseDirection=False ... it would be easy to miss that the whole bytecode would have been applied to the outlines backwards ...

@jenskutilek
Copy link
Collaborator Author

but I assume that when the hashes are stored, they get computed from the compiled and hinted TTF glyphs, not from the UFO glyphs; if that's the case, then no rounding is necessary because the TTFont should already be rounded.

In our case, the hashes are computed from the (quadratic) UFO glyphs, because the TTX assembly is generated from high-level FontLab hint commands (as exported by vfb2ufo/vfb3ufo). A TTF doesn't exist yet when the hashes are computed.

@anthrotype
Copy link
Member

so you're saying the hashes are supposed to be computed on the compiled TTGlyph objects of the TTF that has been hinted, not on the original UFO glyph outlines (the ufo spec doesn't actually say that, maybe it should).
If that is the case, then I don't see how calculating the UFO glyph hash helps in any way.

@anthrotype
Copy link
Member

the hashes are computed from the (quadratic) UFO glyphs

oh now there is a third one which is neither the original UFO glyphs nor the final compiled TTF glyphs but something in between..

@anthrotype
Copy link
Member

if the stored hash is supposed to be computed from the quadratic UFO glyphs, then these are effectively the sources and ufo2ft is not supposed to touch their outlines in any way (apart from the necessary rounding) and thus reverseDirection ought to be False if you're feeding ufo2ft a hinted, quadratic UFO with stored hashes; same for any other outline-editing filter.

@jenskutilek
Copy link
Collaborator Author

the hashes are computed from the (quadratic) UFO glyphs

oh now there is a third one which is neither the original UFO glyphs nor the final compiled TTF glyphs but something in between..

No, there is nothing inbetween. Our sources are hinted quadratic VFBs, which are converted by vfb3ufo to quadratic UFOs, then the hinting is compiled (stored as TTX assembly) inside these quadratic UFOs. The quadratic UFO glyphs are used to calculate and store the hashes.

Then we use fontmake/ufo2ft to build TTFs from the UFOs, and of course ufo2ft's InstructionCompiler should pick up the instructions and store the bytecode in the TTF.

But it should somehow be noticeable if ufo2ft changed the TTF outlines and the instructions don't match anymore, e.g. by applying filters in the TTFPreProcessor. It would be a user error, but still ...

@anthrotype
Copy link
Member

it should somehow be noticeable if ufo2ft changed the TTF outlines and the instructions don't match anymore, e.g. by applying filters in the TTFPreProcessor

but I think this is already the case in the current code, no? Before this PR, we are checking the computed TTGlyph hash against the stored hash, and if they don't match we don't copy the instructions. The problem is only if/when the stored hash is outdated.

@jenskutilek
Copy link
Collaborator Author

but I think this is already the case in the current code, no? Before this PR, we are checking the computed TTGlyph hash against the stored hash, and if they don't match we don't copy the instructions. The problem is only if/when the stored hash is outdated.

The stored hash vs. computed TTGlyph hash fails for all composites with scales, that's how all this trouble got started :)

@anthrotype
Copy link
Member

then should you not simply change the way the stored hash is computed? you do the rounding before computing the hash of the quadratic UFO glyphs so that by the time ufo2ft compares the stored hash against the TTGlyph hash, they will match.

And the UFO spec must be updated to clarify all this.

@anthrotype
Copy link
Member

the stored hash is computed on the UFO glyph outlines at the time the instructions where generated; if the UFO glyph outlines change afterwards, then the instructions can be presumed to be no longer valid and should be ignored. This is signalled by the computed hash of the UFO glyph outlines comparing different from the respective hash as previously stored in the UFO lib at the time the UFO was hinted.
If this is all true, and please correct me if I am wrong, then ufo2ft should only compute the hash on the UFO glyphs and compare with the hash as stored in the UFO lib, and if these don't match, then discard the instructions therein and leave the glyph unhinted; otherwise it should use the instructions. In this scenario and workflow, no rounding whatsoever should be needed when computing the hash. Right now ufo2ft is comparing the stored hash (computed from UFO glyphs) to the hash on the TTGlyph, and this is wrong and should be fixed.

But then you argue that, "what if the user passes some option that may modify the outline e.g. reverseDirection?".
Maybe we could detect that upfront and pass on this info to the instruction compiler somehow.

e.g. copying the instructions and glyph hash from a quadratic UFO to a cubic UFO?"

Do we actually want to support this use case?

@anthrotype
Copy link
Member

So currently, before this PR, we were computing the hash only once (on the TTGlyph) and compare it with the stored hash values in the UFO lib. Now with this PR we are computing it three times: once on the (unrounded) UFO source glyphs (to be compared with the stored hashes), and then again on the rounded UFO glyphs to compare with the equally rounded TTGlyph hash (because we don't trust ourselves nor the user that something may have happened between the input UFO glyph outlines and the final compiled TTGlyphs).
It feels we are doing a lot of work.. If only this stored hash would be computed in a way that takes into account the final desired state of the glyphs as they are compiled to TTGlyphs (i.e. by rounding coordinates and transform), then ufo2ft could merely compute the hash of the latter (as it's currently already doing) and when the stored hash doesn't match the computed TTGlyph hash then it can safely assume the instructions are correct (we don't care at that stage whether or not the hash of UFO glyph outlines actually matches the hash as stored in the UFO, it ought to do if the hinting tool has generated it correctly).

@jenskutilek
Copy link
Collaborator Author

Finally coming back to this ... @anthrotype you're right, the various checks are too excessive. I've done as you suggested and just check the stored glyph hash against the hash of the TTGlyph that is being built.

Also, the hash calculation rounds the component transforms to match the float precision.

@anthrotype anthrotype merged commit 779bbad into main May 13, 2024
9 checks passed
@khaledhosny khaledhosny deleted the component-hash-ttf branch May 15, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants