Try not to use `fmt::Formatter`s #86

Kijewski · 2024-07-22T06:18:00Z

When you write an fmt::Displayable type into a fmt::Write sink, a fmt::Formatter is used to facilitate the writing. It stores a &dyn fmt::Write reference to the sink, so less duplicated byte code is generated, but any possible optimizations become void.

This PR lets rinja try to avoid using the normal write!() machinery. Indeed, no write!() call is made directly anymore in the generated code, but only though a WriteWritable helper trait. This aids the runtime performance quite a bit:

$ cd testing/benches
$ cargo bench --bench all

Big table               time:   [405.60 µs 406.46 µs 407.33 µs]
                        change: [-12.463% -12.239% -12.022%] (p = 0.00 < 0.05)
                        Performance has improved.

Teams                   time:   [200.00 ns 200.30 ns 200.61 ns]
                        change: [-51.953% -51.462% -51.036%] (p = 0.00 < 0.05)
                        Performance has improved.

Todo:

update tests (e.g. ryu does not omits trailing .0s for floats)
test properly
add comments

Kijewski · 2024-07-22T09:06:54Z

Hm, filters (including any explicit |escape) still invariantly causes writes through fmt::Formatter. I guess I can ignore the case for now because the this is a good speed-up for a start?

rinja/src/filters/mod.rs

Kijewski · 2024-07-22T18:33:02Z

testing/tests/simple.rs

@@ -490,7 +490,7 @@ fn test_num_literals() {
    let template = NumLiterals;
    assert_eq!(
        template.render().unwrap(),
-        "[90, -90, 90, 2, 56, 240, 10.5, 10.5, 100000000000, 105000000000]\n1\n12",
+        "[90, -90, 90, 2, 56, 240, 10.5, 10.5, 100000000000.0, 105000000000.0]\n1\n12",


ryu adds a trailing .0 for integral floats. IMO ryu gets it more right than rust's defaults, but we could also strip a trailing .0 to remove the difference.

I don't think it matters much. If people want to change it, they can always using the format filter.

Hum... Thinking some more about it, people will expect us to use the Rust implementation to get the same output. I think it's problematic that ryu doesn't always display the same thing.

Here what I suggest: I talked with Mara and she said that we could use ryu's algorithm directly in std to improve performance. I think it's a better way forward rather than having our own display, the performance difference is not big enough to be worth imo.

The performance is much more impacted by the removal of the &dyn Write, the speed-up through ryu is minuscule in comparison. It's just the first implementation that came to mind.

But I think it's seldom that people will want to print floats, and then most likely with explicit formatting (price|fmt(".02")), so it won't matter much if I remove this specialization.

rinja_derive/src/tests.rs

Kijewski · 2024-07-22T18:35:43Z

rinja_derive/src/generator.rs

-    fn named_expression(
-        &mut self,
-        buf_expr: &mut Buffer,
-        buf_format: &mut Buffer,
-        expr: String,
-        wrapped: DisplayWrap,
-        cacheable: bool,
-        expr_cache: &mut HashMap<String, usize>,
-    ) -> usize {


I inlined this function. In my opinion the new functions is small enough by itself.

rinja_derive/src/generator.rs

rinja/src/filters/escape.rs

GuillaumeGomez · 2024-07-23T13:03:17Z

rinja/src/filters/escape.rs

+}
+
+const _: () = {
+    macro_rules! specialize_ref {


Please a comment explaining why we need this macro and what it does.

It is used just once just below. Should I maybe rename it into impl_ref for impl_for_ref to make the name more speaking?

As you prefer. Either a comment or a renaming to make it clear without needing to read the code. ;)

"Either … or …", well, I did both :D

Haha. Well, it's the best solution I think. :D

GuillaumeGomez · 2024-07-24T13:39:38Z

rinja_derive/src/generator.rs

        if !self.discard {
-            src.append_to(&mut self.buf);
+            if !self.last_was_write_str {


This change in particular will improve things SO MUCH. It's a really good idea!

When you write an `fmt::Display`able type into a `fmt::Write` sink, a `fmt::Formatter` is used to facilitate the writing. It stores a `&dyn fmt::Write` reference to the sink, so less duplicated byte code is generated, but any possible optimizations before void. This PR lets rinja try to avoid using the normal `write!()` machinery. Indeed, not `write!()` call is made directly anymore in the generated code, but only though a `WriteWritable` helper trait. This aids the runtime performance quite a bit: ```text $ cd testing/benches $ cargo bench --bench all Big table time: [405.60 µs 406.46 µs 407.33 µs] change: [-12.463% -12.239% -12.022%] (p = 0.00 < 0.05) Performance has improved. Teams time: [200.00 ns 200.30 ns 200.61 ns] change: [-51.953% -51.462% -51.036%] (p = 0.00 < 0.05) Performance has improved. ```

```text $ cd rinja_derive_standalone $ cargo bench hello_world time: [50.420 µs 50.529 µs 50.637 µs] change: [-0.6238% -0.3448% -0.0695%] (p = 0.02 < 0.05) Change within noise threshold. item_info.html time: [58.807 µs 59.108 µs 59.408 µs] change: [-2.4731% -2.0541% -1.6532%] (p = 0.00 < 0.05) Performance has improved. item_union.html time: [182.90 µs 183.97 µs 185.73 µs] change: [-2.6740% -1.8652% -0.7734%] (p = 0.00 < 0.05) Change within noise threshold. page.html time: [767.05 µs 768.05 µs 769.20 µs] change: [-4.5916% -4.2764% -3.9026%] (p = 0.00 < 0.05) Performance has improved. print_item.html time: [154.97 µs 155.26 µs 155.59 µs] change: [-2.0385% -1.7364% -1.4351%] (p = 0.00 < 0.05) Performance has improved. short_item_info.html time: [126.35 µs 126.49 µs 126.67 µs] change: [-6.5708% -6.1935% -5.8314%] (p = 0.00 < 0.05) Performance has improved. sidebar.html time: [196.25 µs 196.68 µs 197.14 µs] change: [-4.9096% -4.4107% -3.9467%] (p = 0.00 < 0.05) Performance has improved. source.html time: [112.69 µs 113.14 µs 113.76 µs] change: [-3.4552% -3.1557% -2.8072%] (p = 0.00 < 0.05) Performance has improved. type_layout.html time: [160.05 µs 160.30 µs 160.59 µs] change: [-11.126% -10.419% -9.2877%] (p = 0.00 < 0.05) Performance has improved. type_layout_size.html time: [66.006 µs 66.340 µs 66.664 µs] change: [-2.3099% -1.3973% +0.0223%] (p = 0.01 < 0.05) Change within noise threshold. ```

GuillaumeGomez · 2024-07-24T13:46:01Z

rinja/src/filters/escape.rs

+
+const _: () = {
+    // implement FastWritable for a list of reference wrapper types to FastWritable+?Sized
+    macro_rules! impl_for_ref {


GuillaumeGomez · 2024-07-24T13:46:18Z

Looks all good to me, thanks!

Kijewski force-pushed the pr-no-formatter branch 2 times, most recently from e38a0e3 to 52e1f5b Compare July 22, 2024 18:11

Kijewski marked this pull request as ready for review July 22, 2024 18:11

Kijewski requested a review from GuillaumeGomez July 22, 2024 18:26

Kijewski force-pushed the pr-no-formatter branch 2 times, most recently from 9b7d80f to 68d4e50 Compare July 22, 2024 18:29

Kijewski commented Jul 22, 2024

View reviewed changes

rinja/src/filters/mod.rs Show resolved Hide resolved

Kijewski commented Jul 22, 2024

View reviewed changes

rinja_derive/src/tests.rs Show resolved Hide resolved

Kijewski commented Jul 22, 2024

View reviewed changes

rinja_derive/src/generator.rs Show resolved Hide resolved