Improve performance of HTML parser on JVM #67

tindzk · 2019-12-04T17:51:08Z

The HTML parser incurs a significant slowdown as the nesting level
increases:

$ bloop run pine-bench-jvm -- slow
[...]
Benchmark: Parse HTML w/o attributes
- depth=2:
  units: 7
  iterations: 591733
  run time: 3384 μs/it ± 7
- depth=6:
  units: 127
  iterations: 11610
  run time: 171586 μs/it ± 504
- depth=10:
  units: 2047
  iterations: 56
  run time: 36148809 μs/it ± 74820
- depth=14:
  units: 32767
  iterations: 1
  run time: 9353666666 μs/it ± 174704194

Summary:
  Unit growth: 18.1x, 16.1x, 16.0x
  Run time growth: 50.7x, 210.7x, 258.8x

This slowdown can be attributed to the rest() function in Reader.
It calls data.drop() which on the JVM creates a copy of the string
rather than pointing to the same memory.

Scala.js' drop() implementation has the expected semantics such that
the run time performance is roughly linear to the number of nodes in the
tree:

$ bloop run pine-bench-js -- slow
[...]
Benchmark: Parse HTML w/o attributes
- depth=2:
  units: 7
  iterations: 92592
  run time: 21624 μs/it ± 128
- depth=6:
  units: 127
  iterations: 5229
  run time: 382531 μs/it ± 182
- depth=10:
  units: 2047
  iterations: 312
  run time: 6479611 μs/it ± 134542
- depth=14:
  units: 32767
  iterations: 17
  run time: 119013071 μs/it ± 1455066

Summary:
  Unit growth: 18.1x, 16.1x, 16.0x
  Run time growth: 17.7x, 16.9x, 18.4x

After applying the optimisations, the parser will behave similarly on
the JVM:

$ bloop run pine-bench-jvm -- slow
[...]
Benchmark: Parse HTML w/o attributes
- depth=2:
  units: 7
  iterations: 991955
  run time: 2048 μs/it ± 51
- depth=6:
  units: 127
  iterations: 45471
  run time: 43403 μs/it ± 412
- depth=10:
  units: 2047
  iterations: 2523
  run time: 777550 μs/it ± 10750
- depth=14:
  units: 32767
  iterations: 147
  run time: 13759510 μs/it ± 160258

Summary:
  Unit growth: 18.1x, 16.1x, 16.0x
  Run time growth: 21.2x, 17.9x, 17.7x

The HTML parser incurs a significant slowdown as the nesting level increases: ```shell $ bloop run pine-bench-jvm -- slow [...] Benchmark: Parse HTML w/o attributes - depth=2: units: 7 iterations: 591733 run time: 3384 μs/it ± 7 - depth=6: units: 127 iterations: 11610 run time: 171586 μs/it ± 504 - depth=10: units: 2047 iterations: 56 run time: 36148809 μs/it ± 74820 - depth=14: units: 32767 iterations: 1 run time: 9353666666 μs/it ± 174704194 Summary: Unit growth: 18.1x, 16.1x, 16.0x Run time growth: 50.7x, 210.7x, 258.8x ``` This slow down can be attributed to the `rest()` function in `Reader`. It calls `data.drop()` which on the JVM creates a copy of the string rather than pointing to the same memory. Scala.js' `drop()` implementation has the expected semantics such that the run time performance is roughly linear to the number of nodes in the tree: ```shell $ bloop run pine-bench-js -- slow [...] Benchmark: Parse HTML w/o attributes - depth=2: units: 7 iterations: 92592 run time: 21624 μs/it ± 128 - depth=6: units: 127 iterations: 5229 run time: 382531 μs/it ± 182 - depth=10: units: 2047 iterations: 312 run time: 6479611 μs/it ± 134542 - depth=14: units: 32767 iterations: 17 run time: 119013071 μs/it ± 1455066 Summary: Unit growth: 18.1x, 16.1x, 16.0x Run time growth: 17.7x, 16.9x, 18.4x ``` After applying the optimisations, the parser will behave similarly on the JVM: ``` $ bloop run pine-bench-jvm -- slow [...] Benchmark: Parse HTML w/o attributes - depth=2: units: 7 iterations: 991955 run time: 2048 μs/it ± 51 - depth=6: units: 127 iterations: 45471 run time: 43403 μs/it ± 412 - depth=10: units: 2047 iterations: 2523 run time: 777550 μs/it ± 10750 - depth=14: units: 32767 iterations: 147 run time: 13759510 μs/it ± 160258 Summary: Unit growth: 18.1x, 16.1x, 16.0x Run time growth: 21.2x, 17.9x, 17.7x ```

tindzk force-pushed the feat/parser-performance branch 2 times, most recently from 76317fb to f15a6fe Compare October 3, 2020 15:10

tindzk changed the title ~~Improve performance of HTML parser~~ Improve performance of HTML parser on JVM Oct 3, 2020

tindzk force-pushed the feat/parser-performance branch from f15a6fe to 8e70cc9 Compare October 3, 2020 15:16

tindzk merged commit 7a0d566 into master Oct 3, 2020

tindzk deleted the feat/parser-performance branch October 3, 2020 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of HTML parser on JVM #67

Improve performance of HTML parser on JVM #67

tindzk commented Dec 4, 2019 •

edited

Loading

Improve performance of HTML parser on JVM #67

Improve performance of HTML parser on JVM #67

Conversation

tindzk commented Dec 4, 2019 • edited Loading

tindzk commented Dec 4, 2019 •

edited

Loading