Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: html5 attribute parsing (#3393)
## What problem is this PR intended to solve? Addressing quadratic performance of HTML5 attribute parsing as described in #2568 and rubys/nokogumbo#144. Closes #2568 ### Some benchmarks <details><summary>Here is the benchmark script.</summary> ```ruby #!/usr/bin/env ruby require "bundler/inline" gemfile do source "https://rubygems.org" gem "nokogiri", path: "." gem "benchmark-ips" end def html_with_attributes(count) html = "<div" count.times do |j| html << " fake-attr-#{j}" end html << ">" end html = {} Benchmark.ips do |x| x.warmup = 0 [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384 ].each do |attribute_count| html[attribute_count] = html_with_attributes(attribute_count) x.report "#{attribute_count.to_s.rjust(7)} attributes" do Nokogiri::HTML5(html[attribute_count], max_attributes: 100_000) end end end ``` </details> before: ``` Calculating ------------------------------------- 1 attributes 242.976k (± 9.3%) i/s (4.12 μs/i) - 936.599k in 4.838811s 2 attributes 219.456k (± 9.4%) i/s (4.56 μs/i) - 837.644k in 4.858317s 4 attributes 191.173k (± 9.3%) i/s (5.23 μs/i) - 715.392k in 4.882336s 8 attributes 144.813k (±12.2%) i/s (6.91 μs/i) - 516.194k in 4.912577s 16 attributes 83.396k (±17.2%) i/s (11.99 μs/i) - 238.564k in 4.958331s 32 attributes 55.800k (± 9.9%) i/s (17.92 μs/i) - 190.871k in 4.970063s 64 attributes 27.381k (±14.7%) i/s (36.52 μs/i) - 91.867k in 4.981936s 128 attributes 12.707k (± 9.4%) i/s (78.70 μs/i) - 49.906k in 4.990254s 256 attributes 4.723k (± 8.1%) i/s (211.74 μs/i) - 19.657k in 4.995888s 512 attributes 1.417k (±10.6%) i/s (705.61 μs/i) - 6.339k in 4.998084s 1024 attributes 382.078 (±11.5%) i/s (2.62 ms/i) - 1.811k in 5.000356s 2048 attributes 119.825 (±10.8%) i/s (8.35 ms/i) - 582.000 in 5.003555s 4096 attributes 29.034 (± 6.9%) i/s (34.44 ms/i) - 144.000 in 5.003602s 8192 attributes 6.492 (± 0.0%) i/s (154.04 ms/i) - 33.000 in 5.111381s 16384 attributes 1.478 (± 0.0%) i/s (676.68 ms/i) - 8.000 in 5.417232s ``` after: ``` Calculating ------------------------------------- 1 attributes 229.084k (±10.4%) i/s (4.37 μs/i) - 868.071k in 4.846385s 2 attributes 213.788k (± 9.9%) i/s (4.68 μs/i) - 804.683k in 4.859819s 4 attributes 181.602k (±10.7%) i/s (5.51 μs/i) - 652.740k in 4.886700s 8 attributes 137.811k (± 9.9%) i/s (7.26 μs/i) - 500.690k in 4.911176s 16 attributes 95.462k (± 9.9%) i/s (10.48 μs/i) - 338.045k in 4.938699s 32 attributes 61.388k (± 9.5%) i/s (16.29 μs/i) - 216.911k in 4.961587s 64 attributes 34.368k (±10.2%) i/s (29.10 μs/i) - 121.985k in 4.976219s 128 attributes 17.315k (±10.9%) i/s (57.75 μs/i) - 60.716k in 4.987542s 256 attributes 9.011k (± 9.2%) i/s (110.98 μs/i) - 32.443k in 4.992878s 512 attributes 4.552k (±11.8%) i/s (219.69 μs/i) - 16.522k in 4.995834s 1024 attributes 2.255k (±13.6%) i/s (443.38 μs/i) - 8.303k in 4.997861s 2048 attributes 1.091k (±17.6%) i/s (916.91 μs/i) - 4.132k in 4.998710s 4096 attributes 544.840 (±22.0%) i/s (1.84 ms/i) - 2.101k in 5.006943s 8192 attributes 282.789 (±22.3%) i/s (3.54 ms/i) - 1.089k in 5.000604s 16384 attributes 134.991 (±27.4%) i/s (7.41 ms/i) - 542.000 in 5.001360s ``` In graphical form: ![image](https://github.com/user-attachments/assets/c07ac065-0069-43ce-85e7-ed1cb4a86542) Just the "after", showing linear performance now: ![image](https://github.com/user-attachments/assets/3051bcda-1b55-4270-8f14-f24eba0fb327) ### Analysis Analyzing the parsing of a tag with 16000 attributes (using `samply`), we see two hotspots: ![image](https://github.com/user-attachments/assets/3a2409e6-c394-4a73-a1b2-3373654e760d) 1. the `strlen`/`memcmp` checks in libgumbo's `tokenizer.c:finish_attribute_name` which checks for duplicates in the attribute list: https://github.com/sparklemotion/nokogiri/blob/729c96c05f692d69e4d1d3be6d4b9524bdbe9322/gumbo-parser/src/tokenizer.c#L804-L820 2. libxml2's `xmlNewPropInternal` traversing the singly-linked properties list to append each new property (to maintain parse order) This PR addresses (1) by introducing a hashmap and using it if there are 16 or more attributes. This PR addresses (2) by inlining most of `xmlNewNsProp` but keeping a pointer to the last property to avoid re-traversing the linked list each time. After those changes, the flamegraph looks much better, with no obvious hotspots: ![image](https://github.com/user-attachments/assets/36859e83-b35b-4489-af24-7f8c2f52935c) ### hashmap Keeping in mind we may still someday want to publish our libgumbo fork as a separate library, I opted to bring in a new self-contained hashmap library from https://github.com/tidwall/hashmap.c I'm using [xxHash3](https://github.com/Cyan4973/xxHash) which seems to be pretty fast, but I'm open to other opinions on what hash function to use. ## Have you included adequate test coverage? Yes. ## Does this change affect the behavior of either the C or the Java implementations? HTML5 is CRuby-only.
- Loading branch information