Improving Qute Escaper to be as branch-free as possible #45546

franz1981 · 2025-01-13T16:28:23Z

This is a show case of the approach at lemire/Code-used-on-Daniel-Lemire-s-blog#116 but biased for the one and two replacement latin chars case.
It could be expanded to cover for non-latin and the 6 bytes replacement too, with more painfull and complex (in term of logic) changes - but it looks already complex as it is IMO.

Feedbacks are wellcome as questions.
I didn't yet benchmarked it (is sadly low in my prio list - so IDK when I'll have time to contribute a proper bench which stress the branch-predictor in qute-benchmark) - and there's a good chance my "bet" to use StringBuilder to simplify it, making use of the horrible setLength in the hot path, won't pay off.
In such unfortunate case, I will move to using char[] despite it requires latinness checks due to compact string, on String construction :"(

franz1981 · 2025-01-15T13:57:25Z

Using the benchmark at mkouba/qute-benchmarks#1

shows these differences in perf - now:

Benchmark                   (ctrlProbabibility)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt   Score   Error  Units
JsonEscaping.escape                           0                      100                         0        100      32  avgt   10  10.129 ± 0.009  ns/op
JsonEscaping.escape                           0                      100                         0      10000      32  avgt   10  10.129 ± 0.038  ns/op
JsonEscaping.escape                           0                      100                        10        100      32  avgt   10  45.312 ± 0.193  ns/op
JsonEscaping.escape                           0                      100                        10      10000      32  avgt   10  55.989 ± 1.826  ns/op
JsonEscaping.escape                          10                      100                        10        100      32  avgt   10  44.131 ± 2.367  ns/op
JsonEscaping.escape                          10                      100                        10      10000      32  avgt   10  77.187 ± 2.496  ns/op

vs before

Benchmark            (ctrlProbabibility)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt    Score    Error  Units
JsonEscaping.escape                    0                      100                         0        100      32  avgt   10   16.864 ±  0.514  ns/op
JsonEscaping.escape                    0                      100                         0      10000      32  avgt   10   66.942 ±  0.524  ns/op
JsonEscaping.escape                    0                      100                        10        100      32  avgt   10   97.493 ±  5.357  ns/op
JsonEscaping.escape                    0                      100                        10      10000      32  avgt   10  220.471 ±  7.429  ns/op
JsonEscaping.escape                   10                      100                        10        100      32  avgt   10  178.098 ± 15.693  ns/op
JsonEscaping.escape                   10                      100                        10      10000      32  avgt   10  319.793 ± 40.863  ns/op

which is a good improvement - which tends to pay-off more as the number of chars increases

Fyi @mkouba the relevant ones are using 10000 samples since with just 100 the data are very predictable for my CPU model and you cannot really see the benefits of reducing the number of branches

franz1981 · 2025-01-15T14:26:15Z

@galderz on mkouba/qute-benchmarks#1

this implementation should really shine with native image, since it doesn't use StringBuilder :)

independent-projects/qute/core/src/main/java/io/quarkus/qute/JsonEscaper.java

mkouba · 2025-01-17T14:17:21Z

Using the benchmark at mkouba/qute-benchmarks#1

I had to rewrite the JsonEscaping benchmark so that it's using a template instead of JsonEscaper because this class was only introduced in 3.18 and we need to compile/run benchmarks for older versions as well...

shows these differences in perf with before:

Benchmark                   (ctrlProbabibility)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt   Score   Error  Units
JsonEscaping.escape                           0                      100                         0        100      32  avgt   10  10.129 ± 0.009  ns/op
JsonEscaping.escape                           0                      100                         0      10000      32  avgt   10  10.129 ± 0.038  ns/op
JsonEscaping.escape                           0                      100                        10        100      32  avgt   10  45.312 ± 0.193  ns/op
JsonEscaping.escape                           0                      100                        10      10000      32  avgt   10  55.989 ± 1.826  ns/op
JsonEscaping.escape                          10                      100                        10        100      32  avgt   10  44.131 ± 2.367  ns/op
JsonEscaping.escape                          10                      100                        10      10000      32  avgt   10  77.187 ± 2.496  ns/op

vs 7ec85cc

Benchmark            (ctrlProbabibility)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt    Score    Error  Units
JsonEscaping.escape                    0                      100                         0        100      32  avgt   10   16.864 ±  0.514  ns/op
JsonEscaping.escape                    0                      100                         0      10000      32  avgt   10   66.942 ±  0.524  ns/op
JsonEscaping.escape                    0                      100                        10        100      32  avgt   10   97.493 ±  5.357  ns/op
JsonEscaping.escape                    0                      100                        10      10000      32  avgt   10  220.471 ±  7.429  ns/op
JsonEscaping.escape                   10                      100                        10        100      32  avgt   10  178.098 ± 15.693  ns/op
JsonEscaping.escape                   10                      100                        10      10000      32  avgt   10  319.793 ± 40.863  ns/op

which is a good improvement - which tends to pay-off more as the number of chars increases

Aren't those numbers swapped given the fact that avgt mode is used (average time per per operation - lower is better)?

Fyi @mkouba the relevant ones are using 10000 samples since with just 100 the data are very predictable for my CPU model and you cannot really see the benefits of reducing the number of branches

Ok 👍

independent-projects/qute/core/src/main/java/io/quarkus/qute/JsonEscaper.java

franz1981 · 2025-01-17T14:50:16Z

using a template instead of JsonEscaper

Numbers still look the same? Did you checked 🙏?

Aren't those numbers swapped given the fact that avgt mode is used (average time per per operation - lower is better)?

Ops 🤣 I copied in the wrong order , let me fix it

franz1981 · 2025-01-20T15:03:08Z

The HTML one can still made faster, working on it

quarkus-bot · 2025-01-20T16:02:02Z

Status for workflow `Quarkus Documentation CI`

This is the status report for running Quarkus Documentation CI on commit d4d3d81.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

Warning

There are other workflow runs running, you probably need to wait for their status before merging.

github-actions · 2025-01-20T16:08:48Z

🎊 PR Preview 04deb60 has been successfully built and deployed to https://quarkus-pr-main-45546-preview.surge.sh/version/main/guides/

Images of blog posts older than 3 months are not available.
Newsletters older than 3 months are not available.

independent-projects/qute/core/src/main/java/io/quarkus/qute/HtmlEscaper.java

franz1981 · 2025-01-26T22:48:55Z

I will provide soon some up-to-date numbers of the html version too

franz1981 · 2025-01-27T10:33:04Z

These are the numbers while using the HtmlEscaping benchmark at mkouba/qute-benchmarks#2:

Benchmark            (branchfull)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt    Score    Error  Units
HtmlEscaping.escape          true                      100                         0        100      32  avgt   10   59.623 ±  7.042  ns/op
HtmlEscaping.escape          true                      100                         0      10000      32  avgt   10   83.220 ±  0.278  ns/op
HtmlEscaping.escape          true                      100                        10        100      32  avgt   10  144.831 ±  7.340  ns/op
HtmlEscaping.escape          true                      100                        10      10000      32  avgt   10  253.461 ±  1.642  ns/op
HtmlEscaping.escape         false                      100                         0        100      32  avgt   10   68.238 ±  0.359  ns/op
HtmlEscaping.escape         false                      100                         0      10000      32  avgt   10   69.189 ±  3.947  ns/op
HtmlEscaping.escape         false                      100                        10        100      32  avgt   10  167.388 ± 10.995  ns/op
HtmlEscaping.escape         false                      100                        10      10000      32  avgt   10  204.317 ±  2.408  ns/op

Which, as anticipated, are not "wow" - with a small regression in case the pattern of escaping is fully predictable (see 59.6 ns/op vs 68 ns/op), which makes the JIT able to replace the switch with a branch guard, which basically perform a single comparison in case no escaping is needed, falling back to the switch only if it's not matched.
In addition to this, since the number of samples is pretty low (just 100) it enables the super good branchpredictor of Ryzen to shine - and make such branch well predicted, turning the numbers in favour of the branchfull case, but if we use a more realistic (considered still that it's a synthetic test scenario) number of samples to mimic unpredictability of the world (i.e. 10K samples) this is what it looks like:

Benchmark            (branchfull)  (latinCharsProbability)  (replacementProbability)  (samples)  (size)  Mode  Cnt    Score    Error  Units
HtmlEscaping.escape          true                      100                         0      10000      32  avgt   10   83.220 ±  0.278  ns/op
HtmlEscaping.escape          true                      100                        10      10000      32  avgt   10  253.461 ±  1.642  ns/op
HtmlEscaping.escape         false                      100                         0      10000      32  avgt   10   69.189 ±  3.947  ns/op
HtmlEscaping.escape         false                      100                        10      10000      32  avgt   10  204.317 ±  2.408  ns/op

Still not a huge win - since I didn't modified the original algorithm at CharReplacementResultMapper (which sadly uses StringBuilders - which doesn't perform great in native mode and leave the JVM mode to have a much worse unrolling factor) - but decent i.e. ~20% speedup.
I could make it much faster if, similar to the Json version, I'll fully move into using char[], but here the other limiting factor is that Html escaping have much bigger replacements (i.e. 6 chars) for "common" escaping - and it means that, in order to reduce the footprint, I had to use 2 look-up tables to first search the index of the replacement and later replace it, which is

256 bytes byte[] for the index + String[8] with the replacements

This "double lookup" form a data-dependency chain, since we have to load the index before accessing the replacement which is the most relevant factor compared to the original code which just need to correctly speculate the result of the branch instruction (the switch) - making it almost free, when correctly predicted.

A better approach would use a long[256] (similar to Json), but would requires going "all in" and not use Strings for the replacements (since these would be reconstructured from the long) and pack in each long the length of the replacement (if any) with the replacement encoded into.
And clearly this would prevent me to reuse the existing CharReplacementResultMapper since the API is just not "right".

Let me know if you have further questions.

franz1981 · 2025-01-27T14:39:36Z

I've further pushed 1cd3ae3 which is using some knowledge of how branch prediction work (which can use some history of the previous taken branch address not just the current one) to improve it another bit, although without modifying the base class.
This improvement will be more important as the number of chars grows. since the most of the test cost, because we use the whole template, is not just in the replacement itself (which is ~12% of the total cost, for the 32 bytes case)

mkouba

@franz1981 Could you pls squash the commits before we merge?

franz1981 · 2025-01-31T13:45:25Z

Done @mkouba !

quarkus-bot · 2025-02-10T15:35:11Z

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit b57c128.

Failing Jobs

Status	Name	Step	Failures	Logs	Raw logs	Build scan
✖	JVM Tests - JDK 17	`Download previously uploaded .m2 content`	⚠️ Check →	Logs	Raw logs	🚧
✔️	JVM Tests - JDK 21			Logs	Raw logs	🚧

quarkus-bot bot added the area/qute The template engine label Jan 13, 2025

franz1981 marked this pull request as ready for review January 15, 2025 13:55

This comment has been minimized.

Sign in to view

quarkus-bot bot added the triage/flaky-test label Jan 15, 2025

punkratz312 reviewed Jan 16, 2025

View reviewed changes

independent-projects/qute/core/src/main/java/io/quarkus/qute/JsonEscaper.java Outdated Show resolved Hide resolved

independent-projects/qute/core/src/main/java/io/quarkus/qute/JsonEscaper.java Show resolved Hide resolved

mkouba reviewed Jan 17, 2025

View reviewed changes

independent-projects/qute/core/src/main/java/io/quarkus/qute/JsonEscaper.java Show resolved Hide resolved

franz1981 marked this pull request as draft January 20, 2025 15:02

franz1981 force-pushed the branch_free_xcaper branch from dd0393c to d4d3d81 Compare January 20, 2025 15:33

quarkus-bot bot added area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/documentation labels Jan 20, 2025

franz1981 marked this pull request as ready for review January 20, 2025 15:34

franz1981 force-pushed the branch_free_xcaper branch from d4d3d81 to a085c01 Compare January 20, 2025 15:34

franz1981 commented Jan 20, 2025

View reviewed changes

independent-projects/qute/core/src/main/java/io/quarkus/qute/HtmlEscaper.java Show resolved Hide resolved

This comment has been minimized.

Sign in to view

franz1981 force-pushed the branch_free_xcaper branch from a085c01 to 1939891 Compare January 21, 2025 04:24

franz1981 mentioned this pull request Jan 21, 2025

Branch-less alphanumeric underscore's replacement smallrye/smallrye-config#1294

Open

This comment has been minimized.

Sign in to view

mkouba reviewed Jan 28, 2025

View reviewed changes

Improving Qute Escaper to be as branch-free as possible

b57c128

franz1981 force-pushed the branch_free_xcaper branch from 1cd3ae3 to b57c128 Compare January 31, 2025 13:45

This comment has been minimized.

Sign in to view

mkouba approved these changes Feb 10, 2025

View reviewed changes

mkouba merged commit f0a5533 into quarkusio:main Feb 10, 2025
52 of 53 checks passed

quarkus-bot bot added this to the 3.19 - main milestone Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Qute Escaper to be as branch-free as possible #45546

Improving Qute Escaper to be as branch-free as possible #45546

franz1981 commented Jan 13, 2025 •

edited

Loading

franz1981 commented Jan 15, 2025 •

edited

Loading

This comment has been minimized.

franz1981 commented Jan 15, 2025 •

edited

Loading

This comment has been minimized.

mkouba commented Jan 17, 2025

franz1981 commented Jan 17, 2025 •

edited

Loading

franz1981 commented Jan 20, 2025

quarkus-bot bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

This comment has been minimized.

This comment has been minimized.

franz1981 commented Jan 26, 2025

franz1981 commented Jan 27, 2025 •

edited

Loading

franz1981 commented Jan 27, 2025 •

edited

Loading

This comment has been minimized.

mkouba left a comment

franz1981 commented Jan 31, 2025

This comment has been minimized.

quarkus-bot bot commented Feb 10, 2025

Improving Qute Escaper to be as branch-free as possible #45546

Improving Qute Escaper to be as branch-free as possible #45546

Conversation

franz1981 commented Jan 13, 2025 • edited Loading

franz1981 commented Jan 15, 2025 • edited Loading

This comment has been minimized.

franz1981 commented Jan 15, 2025 • edited Loading

This comment has been minimized.

mkouba commented Jan 17, 2025

franz1981 commented Jan 17, 2025 • edited Loading

franz1981 commented Jan 20, 2025

quarkus-bot bot commented Jan 20, 2025

Status for workflow Quarkus Documentation CI

github-actions bot commented Jan 20, 2025

This comment has been minimized.

This comment has been minimized.

franz1981 commented Jan 26, 2025

franz1981 commented Jan 27, 2025 • edited Loading

franz1981 commented Jan 27, 2025 • edited Loading

This comment has been minimized.

mkouba left a comment

Choose a reason for hiding this comment

franz1981 commented Jan 31, 2025

This comment has been minimized.

quarkus-bot bot commented Feb 10, 2025

Status for workflow Quarkus CI

Failing Jobs

franz1981 commented Jan 13, 2025 •

edited

Loading

franz1981 commented Jan 15, 2025 •

edited

Loading

franz1981 commented Jan 15, 2025 •

edited

Loading

franz1981 commented Jan 17, 2025 •

edited

Loading

Status for workflow `Quarkus Documentation CI`

franz1981 commented Jan 27, 2025 •

edited

Loading

franz1981 commented Jan 27, 2025 •

edited

Loading

Status for workflow `Quarkus CI`