-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more comprehensive benchmarking #120
Conversation
Given that RustyBuzz doesn't have a shape-plan cache, I'm really surprised that you see it faster on short text. Would be nice to see how adding |
We do have shape plan caching, but we're not using it here (I think?). But by "faster" I mean faster in relative terms, in absolute terms we are always slower than harfbuzz. :D I'll update the wording.
Indeed, hopefully this can help with evaluating how much the different optimizations help! |
I'm not sure about the shape plan caching thing, but I believe there was a PR of someone adding it (that's why I removed it as a missing feature from the README), maybe @RazrFalcon can clarify. |
Please use full language/script names in benchmarks. The current one are unreadable. For text samples I suggest using Wikipedia articles about the language. It would be more realistic I think. This is what I did originally. Also, the text is too long afaik. I do not know how people usually use shapers, but in my mind this should be done on word/sentence/paragraph basis. Meaning a benchmark with more then 100 words is probably an overkill. Also, try looking at wiki pages for the language to find the most absurd lines. The one with the most diacritics and language-specific weirdness. There is no point in benchmarking "plain" text. How much space all included fonts take? Have you subsetted them as well? They feel small. We most definitely should have macos-only tests. Just make sure the font is actually AAT and not a regular OpenType. We should also test variable fonts. An English monospace would be a good test as well. Yes, we do have shape plan caching since recently, but not automatic like in HB. A caller must cache it on their side.
Cache plan is not affected by the input text. It simply caches font properties. So no.
We have the same number of allocations as HB. Maybe even less. This is by design. I would say that in term of performance optimizations we should simply run rustybuzz under profiler and see the hotspots. There is no much point in comparing it to HB here. Especially since we have a completely different parser. There is also a chance that ragel output for Rust isn't as fast as for C. Either way thanks for your work again. I never had time to do proper benchmarking of rb. My only goal was correctness/completeness. I'm sure there are a lot of low hanging fruits in terms of optimization. Overall, the current results are way better then I was expecting. |
Will do.
Will look into it.
Yeah I actually had the same thought. I guess it makes sense to limit ourselves to one paragraph at most, and maybe add some shorter as well as longer paragraphs.
This will be hard to do for any non-Latin text since I can't read the scripts. 😅 But I can try finding some for the English text, and maybe include some longer zalgo text.
Around 200KB, not subsetted. But this might increase a bit if we also include variable fonts. The problem with subsetting is that we would have to regenerate them every time we add a new benchmark, which is somewhat annoying. :/
Yeah, I'll look into it.
Does using mono make any difference to just a normal English font?
Any ideas how to best profile? The problem with tools like VTune (which I don't have anyway) is that they probably don't work well for programs that finish in a few milliseconds, afaik. |
Me neither. I just google Arabic Wiki -> select Language -> Arabic -> select first sentence. Since it's a wiki there shouldn't be anything offensive in the first few lines, I hope 👀
Good. I would explicitly avoid subsetting, since it would make them too sanitized.
Sort of. For one, the advance for each glyph is the same. So the shaper has to do less work. Which is what we're testing.
On macOS I use Instruments just fine. Simply compile the shape example in release mode and run it via Instruments -> CPU profiler. Yes, on tiny inputs the output would be meh, but on larger one it should be fine. |
Bad news, I just realized that
But yeah I'm sure there are some low-hanging fruits. |
Ugh... I was sure it simply links the system library. Then we could try sending patch to |
I think there already is one harfbuzz/harfbuzz_rs#37, although it also "only" targets 8.4.0. But I have my own branch with 9.0 that I will just use for now. |
The crates.io version is 8.0.0, which is not that far. |
Yeah, but might as well use the newest version if available, no? 😄 @behdad Is there a chance to update |
I merged the 8.4.0 PR. I don't think anyone's working on it currently. |
|
Thanks! Weird, 8.4.0 is still considerably slower than 9.0.0 for me, but I guess it'll do for now. |
That's not expected. |
Ah! All good, I know what's going on, I had to enable the |
|
Note that for the linked submodule I had to disable CoreText from |
Yeah, no worries, I had exactly the same issue when trying it locally and it also worked when I disabled it. |
@RazrFalcon Better now? I think we have good coverage now. And we can always add more stuff later on. |
benches/Cargo.toml
Outdated
@@ -4,5 +4,18 @@ version = "0.1.0" | |||
edition = "2018" | |||
|
|||
[dependencies] | |||
harfbuzz_rs = "1.1.2" | |||
harfbuzz_rs = {git = "https://github.com/harfbuzz/harfbuzz_rs/", rev = "43f0fb5"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add spaces around {}
?
Great work as always! A couple minor fixes and we're ready to merge. |
As for performance, I'm actually surprised how well RB performs. Remember that unlike HB, RB is 100% memory safe. We have zero Also note that everything in That's the problem with TrueType performance in general: you either get performance by not allocating anything, or waste RAM by allocating everything. |
@LaurenzV Can you paste the AAT comparison to HB? I'm curious. |
Sure, that's what I get:
|
I tried to basically include one set of tests for each script that uses some different part of the shaping engine. The results can be seen below. All of the texts are a translated version of the English part using Google Translate.
Some observations:
As expected we are always slower than harfbuzz. I would say on average, we are a bit less than 2x slower, but it does depend a lot on the script and on the input size. In general, for smaller inputs the slowdown is less noticeable in most cases, but for many larger inputs we are much slower, in many cases reaching 2x-3x slowdown. Maybe it's because larger inputs tend to exercise the caching mechanisms more, which harfbuzz has a lot of? But another possbility is that we haven't optimized vector allocations a lot.
Arabic seems to have the best performance overall, but even then, the larger the input the larger the gap. Even for English, performance gets much worse for larger inputs. Hebrew seems to be pretty bad in all cases. The other ones follow more or less a similar pattern, it's always around the 2x range.
Suggestions on what other things to add are welcome. I guess AAT would be nice to have. And we probably should also add specific features to target specific parts of the code (for example to test kerning, etc.), but I think this is a bit overkill for now. I think this is a good start.