Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.12 to 0.13 performance regressions #76

Closed
alexheretic opened this issue Oct 29, 2021 · 7 comments
Closed

0.12 to 0.13 performance regressions #76

alexheretic opened this issue Oct 29, 2021 · 7 comments

Comments

@alexheretic
Copy link

Hey, behaviourally 0.13 is good for me now. However, running the benchmarks in ab_glyph suggests there are performance regressions with the latest version. I'm currently seeing 30-50% slower layout performance there on 0.13.

Looking at your benches here they don't look too bad. Maybe the glyph_index_u41 bench regression is related? Certainly that's done a lot during layout. The other layouty ones look ok.

name                         0.12.3 ns/iter  0.13.2 ns/iter  diff ns/iter   diff %  speedup 
ascender                     814             626                     -188  -23.10%   x 1.30 
glyph_hor_advance            1,437           1,470                     33    2.30%   x 0.98 
glyph_hor_side_bearing       1,451           1,486                     35    2.41%   x 0.98 
glyph_index_u41              12              24                        12  100.00%   x 0.50  
units_per_em                 404             204                     -200  -49.50%   x 1.98 
width                        202             210                        8    3.96%   x 0.96 
x_height                     667             661                       -6   -0.90%   x 1.01

If I get some time I'll try to contribute a layout benchmark. Mostly its calling glyph-index + kern + h-advance for each glyph, so a bench calling those for each char in a paragraph should capture it fairly well.

@RazrFalcon
Copy link
Collaborator

Strange. There were no logic changes. It was mostly new stuff and extensive refactoring.
What font are you using?

@alexheretic
Copy link
Author

If we zero in on the benchmarks here (using SourceSansPro-Regular.ttf) I wonder why glyph_index_u41 is slower. I think that is what's causing the downstream performance regression.

I had a look but can't see any obvious reason the newer code would be slower. Perhaps the refactor is just less optimizable?

@alexheretic
Copy link
Author

alexheretic commented Oct 30, 2021

I messed about with the code a bit last night and managed to make a faster glyph_index, though the code is less graceful #77. And I'm honestly not 100% sure why they'd be such a performance different between the two paths.

With #77 the downstream benches are 10-20% slower (rather than 30-50%), so even merging that this isn't totally resolved 😞

@RazrFalcon
Copy link
Collaborator

RazrFalcon commented Oct 30, 2021

Based on my profiling, it seems like Rust doesn't like new indirect logic in cmap. Previously, I was parsing the subtable data and resolving a glyph ID in a single, long function. One for each subtable format.
Now, I'm parsing a table data first and then trying to find glyph ID.
In theory, the new method should be faster, especially if you're looking for multiple glyphs at once and not just one. But in reality it seems like it prevents some optimizations.

Solutions:

  1. We can have a separate code paths for fast single char-to-glyph resolving. Not sure how much code we would be able to reuse.
  2. You can try running all chars-to-glyphs for each subtable. This, in theory, should be even faster then the previous version. Because now we would not be parsing a table data each time.
  3. What exactly is your code doing? It seems like Face::glyph_index is not the best solution for it. Even if we would be able to optimize it, it's still pretty wasteful, if you're calling it multiple times, because it parses subtables each time.

PS: I've pushed a commit with some explicit inlining. It made cmap slightly faster. For some reason, rustc skips some obviously simple methods.

@RazrFalcon
Copy link
Collaborator

Can confirm. On my hardware, current Face::glyph_index is 23ns and old is 16ns. While subtable4.glyph_index('A') is just 12ns. Which is expected, since we're skipping a subtable parsing.

So basically the "solution" is to use something like:

for subtable in face.tables().cmap?.subtables {
    if !encoding.is_unicode() {
        continue;
    }

    for c in chars {
        if let Some(id) = subtable.glyph_index(c) {
            // store Glyph ID
        }
    }
}

@alexheretic
Copy link
Author

Ah yes, I see now. I never looked as closely at the code before so didn't think about glyph_index parsing tables on each call.

By re-using the parsed subtables here (& doing similarly for kerning) I see regressions turned to improvements (10-30%) over the previous versions 👍

@RazrFalcon
Copy link
Collaborator

Nice! This was a yet another reason to make internals public. Yes, TrueType is endlessly complicated and you have to have a pretty good understanding of how it works. So making a high-level api is an obvious choice for a library. But turns out that people who use ttf-parser doesn't actually want a high-level api. It only gets in the way. I don't plan removing it anytime soon, but from now, the preferable usage is to access TrueType tables directly. Where high-level api is more like a usage example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants