`0.12` to `0.13` performance regressions #76

alexheretic · 2021-10-29T00:10:23Z

Hey, behaviourally 0.13 is good for me now. However, running the benchmarks in ab_glyph suggests there are performance regressions with the latest version. I'm currently seeing 30-50% slower layout performance there on 0.13.

Looking at your benches here they don't look too bad. Maybe the glyph_index_u41 bench regression is related? Certainly that's done a lot during layout. The other layouty ones look ok.

name                         0.12.3 ns/iter  0.13.2 ns/iter  diff ns/iter   diff %  speedup 
ascender                     814             626                     -188  -23.10%   x 1.30 
glyph_hor_advance            1,437           1,470                     33    2.30%   x 0.98 
glyph_hor_side_bearing       1,451           1,486                     35    2.41%   x 0.98 
glyph_index_u41              12              24                        12  100.00%   x 0.50  
units_per_em                 404             204                     -200  -49.50%   x 1.98 
width                        202             210                        8    3.96%   x 0.96 
x_height                     667             661                       -6   -0.90%   x 1.01

If I get some time I'll try to contribute a layout benchmark. Mostly its calling glyph-index + kern + h-advance for each glyph, so a bench calling those for each char in a paragraph should capture it fairly well.

The text was updated successfully, but these errors were encountered:

RazrFalcon · 2021-10-29T06:22:07Z

Strange. There were no logic changes. It was mostly new stuff and extensive refactoring.
What font are you using?

alexheretic · 2021-10-30T02:53:16Z

If we zero in on the benchmarks here (using SourceSansPro-Regular.ttf) I wonder why glyph_index_u41 is slower. I think that is what's causing the downstream performance regression.

I had a look but can't see any obvious reason the newer code would be slower. Perhaps the refactor is just less optimizable?

alexheretic · 2021-10-30T12:29:36Z

I messed about with the code a bit last night and managed to make a faster glyph_index, though the code is less graceful #77. And I'm honestly not 100% sure why they'd be such a performance different between the two paths.

With #77 the downstream benches are 10-20% slower (rather than 30-50%), so even merging that this isn't totally resolved 😞

RazrFalcon · 2021-10-30T15:13:16Z

Based on my profiling, it seems like Rust doesn't like new indirect logic in cmap. Previously, I was parsing the subtable data and resolving a glyph ID in a single, long function. One for each subtable format.
Now, I'm parsing a table data first and then trying to find glyph ID.
In theory, the new method should be faster, especially if you're looking for multiple glyphs at once and not just one. But in reality it seems like it prevents some optimizations.

Solutions:

We can have a separate code paths for fast single char-to-glyph resolving. Not sure how much code we would be able to reuse.
You can try running all chars-to-glyphs for each subtable. This, in theory, should be even faster then the previous version. Because now we would not be parsing a table data each time.
What exactly is your code doing? It seems like Face::glyph_index is not the best solution for it. Even if we would be able to optimize it, it's still pretty wasteful, if you're calling it multiple times, because it parses subtables each time.

PS: I've pushed a commit with some explicit inlining. It made cmap slightly faster. For some reason, rustc skips some obviously simple methods.

RazrFalcon · 2021-10-30T15:44:30Z

Can confirm. On my hardware, current Face::glyph_index is 23ns and old is 16ns. While subtable4.glyph_index('A') is just 12ns. Which is expected, since we're skipping a subtable parsing.

So basically the "solution" is to use something like:

for subtable in face.tables().cmap?.subtables {
    if !encoding.is_unicode() {
        continue;
    }

    for c in chars {
        if let Some(id) = subtable.glyph_index(c) {
            // store Glyph ID
        }
    }
}

alexheretic · 2021-11-01T23:59:14Z

Ah yes, I see now. I never looked as closely at the code before so didn't think about glyph_index parsing tables on each call.

By re-using the parsed subtables here (& doing similarly for kerning) I see regressions turned to improvements (10-30%) over the previous versions 👍

RazrFalcon · 2021-11-02T07:42:31Z

Nice! This was a yet another reason to make internals public. Yes, TrueType is endlessly complicated and you have to have a pretty good understanding of how it works. So making a high-level api is an obvious choice for a library. But turns out that people who use ttf-parser doesn't actually want a high-level api. It only gets in the way. I don't plan removing it anytime soon, but from now, the preferable usage is to access TrueType tables directly. Where high-level api is more like a usage example.

alexheretic mentioned this issue Oct 30, 2021

Optimize Face::glyph_index #77

Closed

alexheretic closed this as completed Nov 1, 2021

dhardy mentioned this issue Nov 2, 2021

Optimisation: face selection for fallbacks kas-gui/kas-text#60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`0.12` to `0.13` performance regressions #76

`0.12` to `0.13` performance regressions #76

alexheretic commented Oct 29, 2021

RazrFalcon commented Oct 29, 2021

alexheretic commented Oct 30, 2021

alexheretic commented Oct 30, 2021 •

edited

Loading

RazrFalcon commented Oct 30, 2021 •

edited

Loading

RazrFalcon commented Oct 30, 2021

alexheretic commented Nov 1, 2021

RazrFalcon commented Nov 2, 2021

0.12 to 0.13 performance regressions #76

0.12 to 0.13 performance regressions #76

Comments

alexheretic commented Oct 29, 2021

RazrFalcon commented Oct 29, 2021

alexheretic commented Oct 30, 2021

alexheretic commented Oct 30, 2021 • edited Loading

RazrFalcon commented Oct 30, 2021 • edited Loading

RazrFalcon commented Oct 30, 2021

alexheretic commented Nov 1, 2021

RazrFalcon commented Nov 2, 2021

`0.12` to `0.13` performance regressions #76

`0.12` to `0.13` performance regressions #76

alexheretic commented Oct 30, 2021 •

edited

Loading

RazrFalcon commented Oct 30, 2021 •

edited

Loading