Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explore further optimizing the HTML5 parser and serializer #2722

Open
flavorjones opened this issue Dec 11, 2022 · 13 comments
Open

explore further optimizing the HTML5 parser and serializer #2722

flavorjones opened this issue Dec 11, 2022 · 13 comments

Comments

@flavorjones
Copy link
Member

flavorjones commented Dec 11, 2022

Ideally we want HTML5 to be the default HTML parser in Nokogiri (see #2331). Some necessary work before we do that is to make sure it's as performant as we can make it.

This issue is open-ended and meant to collect the conversations and optimizations attempts we've made.

gumbo-specific speedup:

General speedup:

@flavorjones
Copy link
Member Author

flavorjones commented Dec 11, 2022

Let's capture a benchmark at the start of this, on my development machine.

#! /usr/bin/env ruby
# coding: utf-8

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "benchmark-ips"
end

require "nokogiri"
require "benchmark/ips"

filenames = [
  "test/files/GH_1042.html", # 650b
  "test/files/tlm.html", # 70kb
  "big_shopping.html", # 1.9mb
]

inputs = filenames.map { |fn| File.read(fn) }

puts RUBY_DESCRIPTION

inputs.each do |input|
  len = input.length

  Benchmark.ips do |x|
    x.warmup = 0
    x.time = 10

    x.report("html5 parse #{len}") do
      Nokogiri::HTML5::Document.parse(input)
    end
    x.report("html4 parse #{len}") do
      Nokogiri::HTML4::Document.parse(input)
    end
    x.compare!
  end
end

puts "=========="

inputs.each do |input|
  len = input.length
  html4_doc = Nokogiri::HTML4::Document.parse(input)
  html5_doc = Nokogiri::HTML5::Document.parse(input)

  Benchmark.ips do |x|
    x.warmup = 0
    x.time = 10

    x.report("html5 serlz #{len}") do
      html5_doc.to_html
    end
    x.report("html4 serlz #{len}") do
      html4_doc.to_html
    end
    x.compare!
  end
end

(the big_shopping.html file is linked to at #2331 (comment))

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Calculating -------------------------------------
     html5 parse 656     19.449k (±14.8%) i/s -    147.873k in   9.935697s
     html4 parse 656     24.883k (±13.9%) i/s -    186.143k in   9.922674s

Comparison:
     html4 parse 656:    24883.0 i/s
     html5 parse 656:    19449.1 i/s - same-ish: difference falls within error

Calculating -------------------------------------
   html5 parse 70095    267.387  (±23.6%) i/s -      2.111k in   9.997117s
   html4 parse 70095    478.390  (±17.1%) i/s -      3.371k in   9.996398s

Comparison:
   html4 parse 70095:      478.4 i/s
   html5 parse 70095:      267.4 i/s - 1.79x  (± 0.00) slower

Calculating -------------------------------------
 html5 parse 1929522     12.895  (±15.5%) i/s -    127.000  in  10.053394s
 html4 parse 1929522     37.610  (±23.9%) i/s -    350.000  in  10.011052s

Comparison:
 html4 parse 1929522:       37.6 i/s
 html5 parse 1929522:       12.9 i/s - 2.92x  (± 0.00) slower

==========

Calculating -------------------------------------
     html5 serlz 656     41.608k (±14.9%) i/s -    381.808k in   9.851842s
     html4 serlz 656     59.478k (±21.8%) i/s -    506.898k in   9.775912s

Comparison:
     html4 serlz 656:    59478.5 i/s
     html5 serlz 656:    41608.1 i/s - same-ish: difference falls within error

Calculating -------------------------------------
   html5 serlz 70095    999.372  (±11.1%) i/s -      9.780k in   9.992943s
   html4 serlz 70095      1.259k (±14.8%) i/s -     12.109k in   9.991011s

Comparison:
   html4 serlz 70095:     1259.3 i/s
   html5 serlz 70095:      999.4 i/s - same-ish: difference falls within error

Calculating -------------------------------------
 html5 serlz 1929522    114.104  (± 8.8%) i/s -      1.131k in  10.003148s
 html4 serlz 1929522    108.313  (±12.0%) i/s -      1.066k in   9.999369s

Comparison:
 html5 serlz 1929522:      114.1 i/s
 html4 serlz 1929522:      108.3 i/s - same-ish: difference falls within error

and also some profiling information:

# stackprof-big-shopping.sh
#! /usr/bin/env bash

if [[ $# -lt 1 ]] ; then
  echo "usage: $0 <output-filename>"
  exit 1
fi

cmd=$(rbenv which ruby)

env \
  LD_PRELOAD=$HOME/local/lib/libprofiler.so \
  CPUPROFILE=$1 \
  $cmd ./stackprof-big-shopping.rb | tee $1.log

pprof --gif $cmd $1 > $1.gif
pprof --text $cmd $1 > $1.text
# stackprof-big-shopping.rb
#! /usr/bin/env ruby
# coding: utf-8

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "benchmark-ips"
end

require "nokogiri"
require "benchmark/ips"

input = File.read("big_shopping.html") # 1.9mb
puts "input #{input.length} bytes"

puts RUBY_DESCRIPTION

Benchmark.ips do |x|
  x.warmup = false
  x.time = 10

  x.report("parsing") { Nokogiri::HTML5::Document.parse(input) }
end

text output:

input 1929522 bytes
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Calculating -------------------------------------
             parsing     13.222  (±15.1%) i/s -    130.000  in  10.042170s

top of the stack profile:

Total: 1035 samples
     120  11.6%  11.6%      120  11.6% decode (inline)
      99   9.6%  21.2%      100   9.7% pthread_attr_setschedparam
      88   8.5%  29.7%      833  80.5% gumbo_parse_with_options
      78   7.5%  37.2%       78   7.5% _init@3e000
      65   6.3%  43.5%      464  44.8% gumbo_lex
      53   5.1%  48.6%      176  17.0% read_char
      42   4.1%  52.7%       63   6.1% gumbo_string_buffer_append_codepoint
      40   3.9%  56.5%       40   3.9% get_current_node.isra.0
      35   3.4%  59.9%      195  18.8% handle_token (inline)
      34   3.3%  63.2%      202  19.5% finish_token.isra.0
      25   2.4%  65.6%       61   5.9% insert_text_token.isra.0
      23   2.2%  67.8%       23   2.2% get_adjusted_current_node
      18   1.7%  69.6%       69   6.7% build_tree
      16   1.5%  71.1%      222  21.4% emit_char
      13   1.3%  72.4%       13   1.3% maybe_emit_from_mark
      13   1.3%  73.6%       21   2.0% maybe_resize_string_buffer
      13   1.3%  74.9%       13   1.3% update_position (inline)
      12   1.2%  76.0%       13   1.3% atomic_sub_nounderflow (inline)
      12   1.2%  77.2%       12   1.2% gumbo_tokenizer_set_is_adjusted_current_node_foreign
      11   1.1%  78.3%       12   1.2% handle_text
      10   1.0%  79.2%       38   3.7% __libc_malloc
      10   1.0%  80.2%       47   4.5% tree_traverse.constprop.0
      10   1.0%  81.2%       10   1.0% xmlStrdup (inline)
       9   0.9%  82.0%        9   0.9% __nss_database_lookup
       7   0.7%  82.7%        7   0.7% cfree
       7   0.7%  83.4%        7   0.7% get_char_token_type (inline)
       7   0.7%  84.1%        7   0.7% gumbo_debug
       7   0.7%  84.7%       43   4.2% handle_in_body
       7   0.7%  85.4%        7   0.7% utf8iterator_get_position (inline)
       6   0.6%  86.0%       19   1.8% utf8iterator_next
       6   0.6%  86.6%       54   5.2% xmlFreeNodeList
       5   0.5%  87.1%       27   2.6% handle_attr_value_double_quoted_state
       5   0.5%  87.5%      123  11.9% handle_html_content (inline)
       5   0.5%  88.0%        5   0.5% is_open_element (inline)
       5   0.5%  88.5%       28   2.7% objspace_malloc_increase_body (inline)
       5   0.5%  89.0%        5   0.5% rbimpl_atomic_size_add (inline)
       4   0.4%  89.4%        4   0.4% append_char_to_tag_buffer
       4   0.4%  89.8%       32   3.1% objspace_xmalloc0
       4   0.4%  90.1%       29   2.8% xmlNewText
       3   0.3%  90.4%        3   0.3% _init@33000
       3   0.3%  90.7%        3   0.3% handle_script_data_state
       3   0.3%  91.0%        3   0.3% malloc_usable_size
       3   0.3%  91.3%       12   1.2% ruby_yyparse
       3   0.3%  91.6%       32   3.1% xmlNewPropInternal
       2   0.2%  91.8%        2   0.2% copy_over_original_tag_text (inline)
       2   0.2%  92.0%        4   0.4% finish_tag_name
       2   0.2%  92.2%       30   2.9% gumbo_alloc
       2   0.2%  92.4%        2   0.2% gumbo_tag_lookup
       2   0.2%  92.6%       10   1.0% handle_after_attr_value_quoted_state
       2   0.2%  92.8%       37   3.6% handle_attr_name_state
       2   0.2%  92.9%        3   0.3% node_html_tag_is (inline)
       2   0.2%  93.1%       10   1.0% realloc
       2   0.2%  93.3%        8   0.8% reset_token_start_point (inline)
       2   0.2%  93.5%        2   0.2% secondary_hash (inline)
       2   0.2%  93.7%        2   0.2% utf8_is_control (inline)
       2   0.2%  93.9%        7   0.7% xmlNewNode
       1   0.1%  94.0%        1   0.1% ISEQ_COMPILE_DATA (inline)
       1   0.1%  94.1%        1   0.1% RVALUE_OLD_P_RAW (inline)
       1   0.1%  94.2%        1   0.1% __libc_open64
       1   0.1%  94.3%        1   0.1% __munmap
       1   0.1%  94.4%        4   0.4% adoption_agency_algorithm
       1   0.1%  94.5%        1   0.1% autoload_c_mark
       1   0.1%  94.6%        1   0.1% callable_method_entry
       1   0.1%  94.7%        9   0.9% copy_over_tag_buffer (inline)
       1   0.1%  94.8%        7   0.7% create_element_from_token
       1   0.1%  94.9%        6   0.6% create_node (inline)
       1   0.1%  95.0%       32   3.1% destroy_node_callback
...

@flavorjones
Copy link
Member Author

@stevecheckoway suggested optimizing out calls to gumbo_debug at #2331 (comment)

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Calculating -------------------------------------
     html5 parse 656     21.011k (±16.5%) i/s -    155.264k in   9.932001s
     html4 parse 656     24.498k (±16.4%) i/s -    178.167k in   9.922181s

Comparison:
     html4 parse 656:    24497.5 i/s
     html5 parse 656:    21011.3 i/s - same-ish: difference falls within error

Calculating -------------------------------------
   html5 parse 70095    293.924  (±22.1%) i/s -      2.355k in  10.008618s
   html4 parse 70095    481.226  (±17.5%) i/s -      3.463k in  10.026732s

Comparison:
   html4 parse 70095:      481.2 i/s
   html5 parse 70095:      293.9 i/s - 1.64x  (± 0.00) slower

Calculating -------------------------------------
 html5 parse 1929522     15.547  (±12.9%) i/s -    152.000  in  10.025299s
 html4 parse 1929522     38.315  (±23.5%) i/s -    354.000  in  10.044622s

Comparison:
 html4 parse 1929522:       38.3 i/s
 html5 parse 1929522:       15.5 i/s - 2.46x  (± 0.00) slower

Already an improvement! A 14% baseline improvement, going from 2.92x slower to 2.46x slower than html4 on the big file.

Here's the stack prof:

input 1929522 bytes
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Calculating -------------------------------------
             parsing     15.184  (±13.2%) i/s -    148.000  in  10.069515s
Total: 1039 samples
      99   9.5%   9.5%      101   9.7% pthread_attr_setschedparam
      87   8.4%  17.9%      813  78.2% gumbo_parse_with_options
      86   8.3%  26.2%       86   8.3% decode (inline)
      69   6.6%  32.8%      155  14.9% read_char
      64   6.2%  39.0%      238  22.9% handle_token (inline)
      58   5.6%  44.6%      432  41.6% gumbo_lex
      55   5.3%  49.9%      201  19.3% finish_token.isra.0
      45   4.3%  54.2%       45   4.3% _init@3e000
      43   4.1%  58.3%       43   4.1% get_current_node.isra.0
      36   3.5%  61.8%       61   5.9% gumbo_string_buffer_append_codepoint
      21   2.0%  63.8%       21   2.0% get_adjusted_current_node
      18   1.7%  65.5%       45   4.3% handle_in_body
      18   1.7%  67.3%       69   6.6% insert_text_token.isra.0
      17   1.6%  68.9%       17   1.6% atomic_sub_nounderflow (inline)
      17   1.6%  70.5%       20   1.9% handle_text
      16   1.5%  72.1%       16   1.5% xmlStrdup (inline)
      14   1.3%  73.4%       42   4.0% objspace_malloc_increase_body (inline)
      13   1.3%  74.7%       13   1.3% __nss_database_lookup
      12   1.2%  75.8%       12   1.2% cfree
      12   1.2%  77.0%       25   2.4% maybe_resize_string_buffer
      11   1.1%  78.1%       11   1.1% get_char_token_type (inline)
      11   1.1%  79.1%       11   1.1% gumbo_tokenizer_set_is_adjusted_current_node_foreign
      11   1.1%  80.2%       31   3.0% handle_attr_value_double_quoted_state
      11   1.1%  81.2%       11   1.1% handle_script_data_state
      11   1.1%  82.3%       50   4.8% tree_traverse.constprop.0
      11   1.1%  83.3%       11   1.1% utf8iterator_get_position (inline)
      10   1.0%  84.3%       33   3.2% __libc_malloc
      10   1.0%  85.3%      141  13.6% handle_html_content (inline)
       9   0.9%  86.1%       76   7.3% build_tree
       9   0.9%  87.0%        9   0.9% update_position (inline)
       7   0.7%  87.7%       67   6.4% xmlFreeNodeList
       6   0.6%  88.3%        6   0.6% maybe_emit_from_mark
       6   0.6%  88.8%        6   0.6% rbimpl_atomic_size_add (inline)
       6   0.6%  89.4%       15   1.4% utf8iterator_next
       5   0.5%  89.9%      215  20.7% emit_char
       5   0.5%  90.4%        5   0.5% malloc_usable_size
       5   0.5%  90.9%       17   1.6% realloc
       4   0.4%  91.2%        4   0.4% has_an_element_in_specific_scope
       4   0.4%  91.6%       34   3.3% xmlNewPropInternal
       3   0.3%  91.9%        4   0.4% append_char_to_tag_buffer
       3   0.3%  92.2%       22   2.1% finish_attribute_name
       3   0.3%  92.5%       13   1.3% gumbo_destroy_attribute
       3   0.3%  92.8%        3   0.3% set_line (inline)
       2   0.2%  93.0%        2   0.2% RB_BUILTIN_TYPE (inline)
       2   0.2%  93.2%        2   0.2% _init@33000
       2   0.2%  93.4%        2   0.2% gumbo_normalized_tagname
       2   0.2%  93.6%        2   0.2% handle_data_state
       2   0.2%  93.7%       10   1.0% handle_tag_name_state
       2   0.2%  93.9%        2   0.2% is_open_element (inline)
       2   0.2%  94.1%        4   0.4% reconstruct_active_formatting_elements
       2   0.2%  94.3%       21   2.0% xmlStrndup
       1   0.1%  94.4%        1   0.1% RB_SPECIAL_CONST_P (inline)
       1   0.1%  94.5%        1   0.1% RCLASS_SUPER (inline)
       1   0.1%  94.6%        1   0.1% RVALUE_WB_UNPROTECTED (inline)
       1   0.1%  94.7%        1   0.1% _dl_rtld_di_serinfo
       1   0.1%  94.8%        1   0.1% add_ctype_to_cc
       1   0.1%  94.9%        1   0.1% callable_method_entry
       1   0.1%  95.0%        1   0.1% copy_over_original_tag_text (inline)

@flavorjones
Copy link
Member Author

flavorjones commented Dec 11, 2022

@stevecheckoway I tried to do LTO but didn't see a noticeable performance improvement. Here's the patch:

diff --git a/ext/nokogiri/extconf.rb b/ext/nokogiri/extconf.rb
index daf6094..7745e4f 100644
--- a/ext/nokogiri/extconf.rb
+++ b/ext/nokogiri/extconf.rb
@@ -618,6 +618,9 @@ def do_clean
 # gumbo html5 serialization is slower with O3, let's make sure we use O2
 append_cflags("-O2")
 
+# link-time optimization
+append_cflags("-flto")
+
 # always include debugging information
 append_cflags("-g")
 
@@ -725,7 +728,7 @@ def install
         class << recipe
           def configure
             env = {}
-            env["CFLAGS"] = concat_flags(ENV["CFLAGS"], "-fPIC", "-g")
+            env["CFLAGS"] = concat_flags(ENV["CFLAGS"], "-fPIC", "-g", "-flto")
             env["CHOST"] = host
             execute("configure", ["./configure", "--static", configure_prefix], { env: env })
             if darwin?
@@ -751,7 +754,7 @@ def configure
         # The libiconv configure script doesn't accept "arm64" host string but "aarch64"
         recipe.host = recipe.host.gsub("arm64-apple-darwin", "aarch64-apple-darwin")
 
-        cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g")
+        cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g", "-flto")
 
         recipe.configure_options += [
           "--disable-dependency-tracking",
@@ -804,7 +807,7 @@ def configure
       recipe.patch_files = Dir[File.join(PACKAGE_ROOT_DIR, "patches", "libxml2", "*.patch")].sort
     end
 
-    cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g")
+    cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g", "-flto")
 
     if zlib_recipe
       recipe.configure_options << "--with-zlib=#{zlib_recipe.path}"
@@ -853,7 +856,7 @@ def configure
       recipe.patch_files = Dir[File.join(PACKAGE_ROOT_DIR, "patches", "libxslt", "*.patch")].sort
     end
 
-    cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g")
+    cflags = concat_flags(ENV["CFLAGS"], "-O2", "-U_FORTIFY_SOURCE", "-g", "-flto")
 
     if darwin? && !cross_build_p
       recipe.configure_options += ["RANLIB=/usr/bin/ranlib", "AR=/usr/bin/ar"]
@@ -974,9 +977,10 @@ def install
     end
 
     def compile
-      cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-O2", "-g")
+      cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-O2", "-g", "-flto")
+      ldflags = concat_flags(ENV["LDFLAGS"], "-flto")
 
-      env = { "CC" => gcc_cmd, "CFLAGS" => cflags }
+      env = { "CC" => gcc_cmd, "CFLAGS" => cflags, "LDFLAGS" => ldflags }
       if config_cross_build?
         if /darwin/.match?(host)
           env["AR"] = "#{host}-libtool"

Is there anything obvious I'm missing? I haven't got much experience with link-time optimization, I may be missing something elementary. (Note that flags passed to append_cflags will also appear in link lines.)

@flavorjones
Copy link
Member Author

Also, for comparison, this is the stack profile for libxml2's HTML4 parser on the same file:

Total: 1046 samples
     182  17.4%  17.4%      186  17.8% pthread_attr_setschedparam
     145  13.9%  31.3%      145  13.9% htmlCurrentChar.part.0
     125  12.0%  43.2%      273  26.1% htmlParseScript
      50   4.8%  48.0%       50   4.8% atomic_sub_nounderflow (inline)
      50   4.8%  52.8%       50   4.8% xmlNextChar
      44   4.2%  57.0%       44   4.2% __nss_database_lookup
      32   3.1%  60.0%      177  16.9% htmlCurrentChar (inline)
      28   2.7%  62.7%       28   2.7% rbimpl_atomic_size_add (inline)
      27   2.6%  65.3%      118  11.3% objspace_malloc_increase_body (inline)
      23   2.2%  67.5%      181  17.3% xmlFreeNodeList
      21   2.0%  69.5%       21   2.0% malloc_usable_size
      18   1.7%  71.2%       18   1.7% _init@3e000
      18   1.7%  72.9%      131  12.5% htmlParseHTMLAttribute
      17   1.6%  74.6%       89   8.5% __libc_malloc
      12   1.1%  75.7%       57   5.4% htmlParseHTMLName
      12   1.1%  76.9%       12   1.1% xmlDictComputeBigKey.part.0
      11   1.1%  77.9%       23   2.2% xmlDictComputeBigKey (inline)
      10   1.0%  78.9%       10   1.0% _init@33000
       9   0.9%  79.7%       30   2.9% objspace_xrealloc.isra.0
       9   0.9%  80.6%        9   0.9% ruby_xmalloc0 (inline)
       9   0.9%  81.5%       42   4.0% xmlDictLookup
       7   0.7%  82.1%        7   0.7% cfree
       7   0.7%  82.8%      389  37.2% htmlParseStartTag
       7   0.7%  83.5%        9   0.9% xmlStrEqual
       6   0.6%  84.0%        6   0.6% htmlSkipBlankChars
       6   0.6%  84.6%       15   1.4% ruby_xmalloc
       5   0.5%  85.1%       28   2.7% htmlParseCharDataInternal.constprop.0
       5   0.5%  85.6%      184  17.6% ruby_sized_xfree
       5   0.5%  86.0%        5   0.5% xmlStrcasecmp
       5   0.5%  86.5%        5   0.5% xmlStrdup (inline)
       4   0.4%  86.9%       28   2.7% bsearch (inline)
       4   0.4%  87.3%        4   0.4% objspace_malloc_gc_stress (inline)
       4   0.4%  87.7%       20   1.9% xmlNewPropInternal
       4   0.4%  88.0%       49   4.7% xmlNewText
       3   0.3%  88.3%        3   0.3% RVALUE_OLD_P_RAW (inline)
       3   0.3%  88.6%        7   0.7% htmlGetEndPriority
       3   0.3%  88.9%      138  13.2% htmlParseAttValue (inline)
       3   0.3%  89.2%       14   1.3% htmlParseHTMLName_nonInvasive.isra.0
       3   0.3%  89.5%      169  16.2% objspace_xmalloc0
       3   0.3%  89.8%       16   1.5% realloc
       3   0.3%  90.1%       19   1.8% ruby_yyparse
       3   0.3%  90.3%        4   0.4% xmlFreeID
       3   0.3%  90.6%       90   8.6% xmlFreeProp
       3   0.3%  90.9%       99   9.5% xmlFreePropList (inline)
       3   0.3%  91.2%      107  10.2% xmlSAX2AttributeInternal
       3   0.3%  91.5%      155  14.8% xmlSAX2StartElement
       2   0.2%  91.7%        2   0.2% RB_BUILTIN_TYPE (inline)
       2   0.2%  91.9%        2   0.2% RVALUE_MARKED (inline)
       2   0.2%  92.1%       13   1.2% gc_mark_stacked_objects (inline)
       2   0.2%  92.3%       17   1.6% htmlCompareStartClose (inline)
       2   0.2%  92.4%      985  94.2% htmlParseContentInternal

It's interesting that pthread_attr_setschedparam dominates both profiles.

@flavorjones
Copy link
Member Author

flavorjones commented Dec 11, 2022

Pulling on the malloc thread (pun intended), there's a significant performance improvement overall for libxml2 (and so also for html5 serialization) if we don't tell libxml2 to use ruby's memory management functions. I'll ship a PR soon.

edit: PR at #2734

@flavorjones
Copy link
Member Author

@stevecheckoway I'm out of ideas. Anything else you'd like me to explore?

@stevecheckoway
Copy link
Contributor

stevecheckoway commented Dec 20, 2022

With -flto, you might want to investigate using -O3 again for more aggressive inlining.

I tried adding -fvisibility=hidden to CFLAGS and __attribute__((visibility("default"))) to Init_Nokogiri in an attempt to help the compiler do a better job stripping dead code + inlining. I'm not sure if that had any impact. Most likely it improves process start up time slightly.

I'm surprised decode is showing up at all. From what I can tell by disassembling the nokogiri binary, these are the instructions in read_char that make up decode():

0000000000186fd4        mov     w14, #0xff
0000000000186fd8        adr     x15, #0x379b8
0000000000186fdc        nop
0000000000186fe0        ldrb    w16, [x9, x11]
0000000000186fe4        ldrb    w17, [x15, x16]
0000000000186fe8        lsr     w0, w14, w17
0000000000186fec        and     w0, w0, w16
0000000000186ff0        bfi     w16, w8, #6, #26
0000000000186ff4        cmp     w13, #0x0
0000000000186ff8        csel    w8, w0, w16, eq
0000000000186ffc        mov     w13, w13
0000000000187000        orr     x13, x13, #0x100
0000000000187004        add     x13, x13, x17
0000000000187008        ldrb    w13, [x15, x13]

It's surprising that we'd be spending such a large amount of time in this handful of instructions given how much code it takes to handle each byte of input.

@flavorjones flavorjones changed the title investigate optimizing the HTML5 parser and serializer explore further optimizing the HTML5 parser and serializer Dec 22, 2022
@flavorjones
Copy link
Member Author

flavorjones commented Dec 22, 2022

@stevecheckoway although I can get small (1%-6%) speedups on libgumbo parsing with -flto -O3, serialization seems to usually be 17%-25% slower. I've tried variations including:

  • compiling libxml2+libxslt with and without -flto
  • O2 and O3
  • with and without the visibility declaration and compiler option

Maybe I'm making systemic errors doing this? I'll go back and try to reproduce these results.

I'm also surprised to see decode come up so reliably in the stack profiling output. I'm not sure what to do with that information, honestly.

@flavorjones
Copy link
Member Author

I diassembled utf8.o on my machine, too, and saw that it looks like decode is being inlined well (optimizations for the value of state, etc.). It's possible this is a measurement artifact from how gperftools determines "where" it is when inlined instructions are being interleaved?

I ran pprof and asked it to disassemble read_char, which gives counts per line. Note that in this analysis, decode() is on lines 76-86, so pay attention to those line numbers.

The whole function's analysis is here.
ROUTINE ====================== read_char
   190    190 samples (flat, cumulative) 18.3% of total
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.c
    10     10   110: static void read_char(Utf8Iterator* iter) {
    10     10      16b380: push   %rbp
     .      .      16b381: push   %rbx
     .      .      16b382: sub    $0x8,%rsp
     4      4   111: if (iter->_start >= iter->_end) {
     2      2      16b386: mov    (%rdi),%rbp
     1      1      16b389: mov    0x10(%rdi),%r11
     .      .      16b38d: cmp    %r11,%rbp
     .      .      16b390: jae    16b470 <read_char+0xf0>
     1      1      16b396: mov    %rbp,%rdx
     7      7   119: uint32_t state = UTF8_ACCEPT;
     7      7      16b399: xor    %eax,%eax
     .      .   118: uint32_t code_point = 0;
     .      .      16b39b: xor    %esi,%esi
     4      4    82: : (0xff >> type) & (byte);
     .      .      16b39d: mov    $0xff,%ebx
     1      1      16b3a2: lea    0x39b17(%rip),%r10        # 1a4ec0 <utf8d>
     3      3      16b3a9: jmp    16b3e0 <read_char+0x60>
     .      .      16b3ab: nopl   0x0(%rax,%rax,1)
     .      .    84: *state = utf8d[256 + *state + type];
     .      .      16b3b0: lea    0x100(%rcx,%rax,1),%eax
     .      .    81: ? (byte & 0x3fu) | (*codep << 6)
     .      .      16b3b7: and    $0x3f,%r8d
     .      .      16b3bb: shl    $0x6,%esi
     .      .    84: *state = utf8d[256 + *state + type];
     .      .      16b3be: movzbl (%r10,%rax,1),%eax
     .      .    82: : (0xff >> type) & (byte);
     .      .      16b3c3: or     %r8d,%esi
     .      .   122: if (state == UTF8_ACCEPT) {
     .      .      16b3c6: test   %eax,%eax
     .      .      16b3c8: je     16b40a <read_char+0x8a>
     .      .   151: } else if (state == UTF8_REJECT) {
     .      .      16b3ca: cmp    $0xc,%eax
     .      .      16b3cd: je     16b4b0 <read_char+0x130>
     .      .      16b3d3: add    $0x1,%rdx
     .      .      16b3d7: cmp    %rdx,%r11
     .      .      16b3da: je     16b490 <read_char+0x110>
     .      .   121: decode(&state, &code_point, (uint32_t)(unsigned char) (*c));
     .      .      16b3e0: movzbl (%rdx),%r9d
    19     19    77: uint32_t type = utf8d[byte];
    19     19      16b3e4: mov    %r9d,%ecx
     3      3   121: decode(&state, &code_point, (uint32_t)(unsigned char) (*c));
     3      3      16b3e7: mov    %r9d,%r8d
     3      3    77: uint32_t type = utf8d[byte];
     3      3      16b3ea: movzbl (%r10,%rcx,1),%ecx
    36     36    82: : (0xff >> type) & (byte);
    36     36      16b3ef: test   %eax,%eax
     .      .      16b3f1: jne    16b3b0 <read_char+0x30>
     .      .    84: *state = utf8d[256 + *state + type];
     .      .      16b3f3: lea    0x100(%rcx,%rax,1),%eax
    11     11    82: : (0xff >> type) & (byte);
    11     11      16b3fa: mov    %ebx,%esi
     1      1    84: *state = utf8d[256 + *state + type];
     1      1      16b3fc: movzbl (%r10,%rax,1),%eax
    47     47    82: : (0xff >> type) & (byte);
    45     45      16b401: sar    %cl,%esi
     2      2      16b403: and    %r9d,%esi
     1      1   122: if (state == UTF8_ACCEPT) {
     1      1      16b406: test   %eax,%eax
     .      .      16b408: jne    16b3ca <read_char+0x4a>
    10     10   123: iter->_width = c - iter->_start + 1;
    10     10      16b40a: mov    %rdx,%rax
     .      .      16b40d: sub    %rbp,%rax
     .      .      16b410: add    $0x1,%rax
     .      .      16b414: mov    %rax,0x20(%rdi)
     .      .   129: if (code_point == '\r') {
     .      .      16b418: cmp    $0xd,%esi
     .      .      16b41b: jne    16b4e0 <read_char+0x160>
     .      .   130: assert(iter->_width == 1);
     .      .      16b421: cmp    $0x1,%rax
     .      .      16b425: jne    16b530 <read_char+0x1b0>
     .      .   131: const char* next = c + 1;
     .      .      16b42b: lea    0x1(%rdx),%rax
     .      .   132: if (next < iter->_end && *next == '\n') {
     .      .      16b42f: cmp    %rax,%r11
     .      .      16b432: jbe    16b43e <read_char+0xbe>
     .      .      16b434: cmpb   $0xa,0x1(%rdx)
     .      .      16b438: je     16b51f <read_char+0x19f>
     .      .   141: iter->_current = code_point;
     .      .      16b43e: movl   $0xa,0x18(%rdi)
     .      .      16b445: mov    $0xa,%eax
     .      .      16b44a: mov    $0xa,%esi
-------------------- ...-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/ascii.h
     .      .    47: && (_gumbo_ascii_table[c] & GUMBO_ASCII_SPACE);
     .      .      16b44f: mov    0xa77f2(%rip),%rdx        # 212c48 <_gumbo_ascii_table@@Base+0x6db28>
     .      .      16b456: testb  $0x2,(%rdx,%rax,1)
     .      .      16b45a: jne    16b47f <read_char+0xff>
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.c
     .      .   147: && !(gumbo_ascii_isspace(code_point) || code_point == 0)) {
     .      .      16b45c: test   %esi,%esi
     .      .      16b45e: je     16b47f <read_char+0xff>
     .      .   168: }
     .      .      16b460: add    $0x8,%rsp
     .      .   148: add_error(iter, GUMBO_ERR_CONTROL_CHARACTER_IN_INPUT_STREAM);
     .      .      16b464: mov    $0x6,%esi
     .      .   168: }
     .      .      16b469: pop    %rbx
     .      .      16b46a: pop    %rbp
     .      .   148: add_error(iter, GUMBO_ERR_CONTROL_CHARACTER_IN_INPUT_STREAM);
     .      .      16b46b: jmpq   16b330 <add_error>
     .      .   113: iter->_current = -1;
     .      .      16b470: movl   $0xffffffff,0x18(%rdi)
     .      .   114: iter->_width = 0;
     .      .      16b477: movq   $0x0,0x20(%rdi)
     1      1   168: }
     1      1      16b47f: add    $0x8,%rsp
     .      .      16b483: pop    %rbx
     .      .      16b484: pop    %rbp
     .      .      16b485: retq   
     .      .      16b486: nopw   %cs:0x0(%rax,%rax,1)
     .      .   165: iter->_width = iter->_end - iter->_start;
     .      .      16b490: sub    %rbp,%r11
     .      .   166: iter->_current = kUtf8ReplacementChar;
     .      .      16b493: movl   $0xfffd,0x18(%rdi)
     .      .   167: add_error(iter, GUMBO_ERR_UTF8_TRUNCATED);
     .      .      16b49a: mov    $0x32,%esi
     .      .   165: iter->_width = iter->_end - iter->_start;
     .      .      16b49f: mov    %r11,0x20(%rdi)
     .      .   168: }
     .      .      16b4a3: add    $0x8,%rsp
     .      .      16b4a7: pop    %rbx
     .      .      16b4a8: pop    %rbp
     .      .   167: add_error(iter, GUMBO_ERR_UTF8_TRUNCATED);
     .      .      16b4a9: jmpq   16b330 <add_error>
     .      .      16b4ae: xchg   %ax,%ax
     .      .   154: iter->_width = c - iter->_start + (c == iter->_start);
     .      .      16b4b0: mov    %rdx,%rax
     .      .   155: iter->_current = kUtf8ReplacementChar;
     .      .      16b4b3: movl   $0xfffd,0x18(%rdi)
     .      .   156: add_error(iter, GUMBO_ERR_UTF8_INVALID);
     .      .      16b4ba: mov    $0x31,%esi
     .      .   154: iter->_width = c - iter->_start + (c == iter->_start);
     .      .      16b4bf: sub    %rbp,%rax
     .      .      16b4c2: cmp    %rdx,%rbp
     .      .      16b4c5: sete   %dl
     .      .      16b4c8: movzbl %dl,%edx
     .      .      16b4cb: add    %rdx,%rax
     .      .      16b4ce: mov    %rax,0x20(%rdi)
     .      .   168: }
     .      .      16b4d2: add    $0x8,%rsp
     .      .      16b4d6: pop    %rbx
     .      .      16b4d7: pop    %rbp
     .      .   156: add_error(iter, GUMBO_ERR_UTF8_INVALID);
     .      .      16b4d8: jmpq   16b330 <add_error>
     .      .      16b4dd: nopl   (%rax)
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.h
     .      .    67: return c >= 0xD800 && c <= 0xDFFF;
     .      .      16b4e0: lea    -0xd800(%rsi),%edx
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.c
     .      .   141: iter->_current = code_point;
     .      .      16b4e6: mov    %esi,0x18(%rdi)
     .      .      16b4e9: movslq %esi,%rax
    16     16   142: if (utf8_is_surrogate(code_point)) {
    16     16      16b4ec: cmp    $0x7ff,%edx
     .      .      16b4f2: ja     16b504 <read_char+0x184>
     .      .   168: }
     .      .      16b4f4: add    $0x8,%rsp
     .      .   143: add_error(iter, GUMBO_ERR_SURROGATE_IN_INPUT_STREAM);
     .      .      16b4f8: mov    $0x28,%esi
     .      .   168: }
     .      .      16b4fd: pop    %rbx
     .      .      16b4fe: pop    %rbp
     .      .   143: add_error(iter, GUMBO_ERR_SURROGATE_IN_INPUT_STREAM);
     .      .      16b4ff: jmpq   16b330 <add_error>
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.h
     1      1    73: (c >= 0xFDD0 && c <= 0xFDEF)
     1      1      16b504: lea    -0xfdd0(%rsi),%edx
     .      .    75: || ((c & 0xFFFF) == 0xFFFF);
     .      .      16b50a: cmp    $0x1f,%edx
     .      .      16b50d: ja     16b54f <read_char+0x1cf>
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.c
     .      .   168: }
     .      .      16b50f: add    $0x8,%rsp
     .      .   145: add_error(iter, GUMBO_ERR_NONCHARACTER_IN_INPUT_STREAM);
     .      .      16b513: mov    $0x24,%esi
     .      .   168: }
     .      .      16b518: pop    %rbx
     .      .      16b519: pop    %rbp
     .      .   145: add_error(iter, GUMBO_ERR_NONCHARACTER_IN_INPUT_STREAM);
     .      .      16b51a: jmpq   16b330 <add_error>
     .      .   134: ++iter->_start;
     .      .      16b51f: add    $0x1,%rbp
     .      .   137: ++iter->_pos.offset;
     .      .      16b523: addq   $0x1,0x38(%rdi)
     .      .   134: ++iter->_start;
     .      .      16b528: mov    %rbp,(%rdi)
     .      .   137: ++iter->_pos.offset;
     .      .      16b52b: jmpq   16b43e <read_char+0xbe>
     .      .      16b530: lea    0x39969(%rip),%rcx        # 1a4ea0 <__PRETTY_FUNCTION__.2651>
     .      .      16b537: mov    $0x82,%edx
     .      .      16b53c: lea    0x3992b(%rip),%rsi        # 1a4e6e <__PRETTY_FUNCTION__.3222+0x1e>
     .      .      16b543: lea    0x3992b(%rip),%rdi        # 1a4e75 <__PRETTY_FUNCTION__.3222+0x25>
     .      .      16b54a: callq  44c70 <__assert_fail@plt>
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.h
     4      4    75: || ((c & 0xFFFF) == 0xFFFF);
     .      .      16b54f: movzwl %si,%edx
     4      4      16b552: sub    $0xfffe,%edx
     .      .      16b558: cmp    $0x1,%edx
     .      .      16b55b: jbe    16b50f <read_char+0x18f>
     1      1    80: return ((unsigned int)c < 0x1Fu) || (c >= 0x7F && c <= 0x9F);
     1      1      16b55d: lea    -0x7f(%rsi),%edx
-------------------- ...x-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/utf8.c
    11     11   146: } else if (utf8_is_control(code_point)
     .      .      16b560: cmp    $0x20,%edx
     .      .      16b563: jbe    16b56e <read_char+0x1ee>
    11     11      16b565: cmp    $0x1e,%esi
     .      .      16b568: ja     16b47f <read_char+0xff>
-------------------- ...-gnu/ports/libgumbo/1.0.0-nokogiri/gumbo-parser/ascii.h
     .      .    47: && (_gumbo_ascii_table[c] & GUMBO_ASCII_SPACE);
     .      .      16b56e: test   $0xffffff80,%esi
     .      .      16b574: jne    16b460 <read_char+0xe0>
     .      .      16b57a: jmpq   16b44f <read_char+0xcf>
     .      .      16b57f:    nop

But here's the interesting bit that accounts for the high number of samples for "decode":

    19     19    77: uint32_t type = utf8d[byte];
    19     19      16b3e4: mov    %r9d,%ecx
     3      3   121: decode(&state, &code_point, (uint32_t)(unsigned char) (*c));
     3      3      16b3e7: mov    %r9d,%r8d
     3      3    77: uint32_t type = utf8d[byte];
     3      3      16b3ea: movzbl (%r10,%rcx,1),%ecx
    36     36    82: : (0xff >> type) & (byte);
    36     36      16b3ef: test   %eax,%eax
     .      .      16b3f1: jne    16b3b0 <read_char+0x30>
     .      .    84: *state = utf8d[256 + *state + type];
     .      .      16b3f3: lea    0x100(%rcx,%rax,1),%eax
    11     11    82: : (0xff >> type) & (byte);
    11     11      16b3fa: mov    %ebx,%esi
     1      1    84: *state = utf8d[256 + *state + type];
     1      1      16b3fc: movzbl (%r10,%rax,1),%eax
    47     47    82: : (0xff >> type) & (byte);
    45     45      16b401: sar    %cl,%esi
     2      2      16b403: and    %r9d,%esi

So it looks like the test instruction at 16b3ef and the sar instruction at 16b401 are the things that are getting snapshot most often.

@ilyazub
Copy link
Contributor

ilyazub commented Nov 15, 2023

Thanks for your work! I used your benchmark with Ruby 3.2.2 and 2.7.2, and added Nokolexbor to the benchmark.

Nokolexbor is 2-12 times faster when parsing and 2-6 times faster when serializing.

@flavorjones What do you think about using Nokolexbor for the HTML processing in Nokogiri?

#! /usr/bin/env ruby
# coding: utf-8

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "nokolexbor"
  gem "benchmark-ips"
end

require "nokogiri"
require "nokolexbor"
require "benchmark/ips"

filenames = [
  "test/files/GH_1042.html", # 650b
  "test/files/tlm.html", # 70kb
  "big_shopping.html", # 1.9mb
]

inputs = filenames.map { |fn| File.read(fn) }

puts RUBY_DESCRIPTION

inputs.each do |input|
  len = input.length

  Benchmark.ips do |x|
    x.warmup = 0
    x.time = 10

    x.report("html5 parse #{len}") do
      Nokogiri::HTML5::Document.parse(input)
    end
    x.report("html4 parse #{len}") do
      Nokogiri::HTML4::Document.parse(input)
    end
    x.report("nokolexbor html5 parse #{len}") do
      Nokolexbor::HTML(input)
    end
    x.compare!
  end
end

puts "=========="

inputs.each do |input|
  len = input.length
  html4_doc = Nokogiri::HTML4::Document.parse(input)
  html5_doc = Nokogiri::HTML5::Document.parse(input)
  html5_doc_nokolexbor = Nokolexbor::HTML(input)

  Benchmark.ips do |x|
    x.warmup = 0
    x.time = 10

    x.report("html5 serlz #{len}") do
      html5_doc.to_html
    end
    x.report("html4 serlz #{len}") do
      html4_doc.to_html
    end
    x.report("html5 nokolexbor serlz #{len}") do
      html5_doc_nokolexbor.to_html
    end
    x.compare!
  end
end
ruby 2.7.2 benchmark
$ ruby bench.rb
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
Calculating -------------------------------------
     html5 parse 656     21.049k (±23.5%) i/s -    179.547k in   9.929976s
     html4 parse 656     22.142k (±22.3%) i/s -    189.923k in   9.926466s
nokolexbor html5 parse 656
                         43.945k (±21.3%) i/s -    296.049k in   9.900173s

Comparison:
nokolexbor html5 parse 656:    43944.8 i/s
     html4 parse 656:    22141.7 i/s - 1.98x  (± 0.00) slower
     html5 parse 656:    21048.8 i/s - 2.09x  (± 0.00) slower

Calculating -------------------------------------
   html5 parse 70095    300.102  (±18.7%) i/s -      2.684k in   9.997238s
   html4 parse 70095    450.409  (±22.6%) i/s -      3.978k in   9.997504s
nokolexbor html5 parse 70095
                          1.406k (±20.4%) i/s -     13.083k in   9.984839s

Comparison:
nokolexbor html5 parse 70095:     1405.6 i/s
   html4 parse 70095:      450.4 i/s - 3.12x  (± 0.00) slower
   html5 parse 70095:      300.1 i/s - 4.68x  (± 0.00) slower

Calculating -------------------------------------
 html5 parse 1929522     13.132  (± 7.6%) i/s -    131.000  in  10.075865s
 html4 parse 1929522     37.880  (±13.2%) i/s -    370.000  in  10.017928s
nokolexbor html5 parse 1929522
                        157.773  (± 9.5%) i/s -      1.561k in   9.999853s

Comparison:
nokolexbor html5 parse 1929522:      157.8 i/s
 html4 parse 1929522:       37.9 i/s - 4.17x  (± 0.00) slower
 html5 parse 1929522:       13.1 i/s - 12.01x  (± 0.00) slower

==========
Calculating -------------------------------------
     html5 serlz 656     40.303k (±17.2%) i/s -    373.898k in   9.891472s
     html4 serlz 656     53.260k (±18.3%) i/s -    484.973k in   9.844606s
html5 nokolexbor serlz 656
                        263.888k (±15.5%) i/s -      2.270M in   9.493963s

Comparison:
html5 nokolexbor serlz 656:   263887.5 i/s
     html4 serlz 656:    53260.0 i/s - 4.95x  (± 0.00) slower
     html5 serlz 656:    40303.4 i/s - 6.55x  (± 0.00) slower

Calculating -------------------------------------
   html5 serlz 70095    918.855  (±15.1%) i/s -      8.842k in   9.993063s
   html4 serlz 70095      1.112k (±13.3%) i/s -     10.828k in   9.992264s
html5 nokolexbor serlz 70095
                          3.359k (±14.9%) i/s -     32.417k in   9.985435s

Comparison:
html5 nokolexbor serlz 70095:     3358.8 i/s
   html4 serlz 70095:     1112.0 i/s - 3.02x  (± 0.00) slower
   html5 serlz 70095:      918.9 i/s - 3.66x  (± 0.00) slower

Calculating -------------------------------------
 html5 serlz 1929522    107.234  (±12.1%) i/s -      1.055k in  10.007869s
 html4 serlz 1929522    115.701  (±11.2%) i/s -      1.140k in   9.999178s
html5 nokolexbor serlz 1929522
                        425.103  (±19.8%) i/s -      4.042k in   9.994780s

Comparison:
html5 nokolexbor serlz 1929522:      425.1 i/s
 html4 serlz 1929522:      115.7 i/s - 3.67x  (± 0.00) slower
 html5 serlz 1929522:      107.2 i/s - 3.96x  (± 0.00) slower
ruby 3.2.2 benchmark
$ ruby ./bench.rb
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
Calculating -------------------------------------
     html5 parse 656     21.030k (±18.5%) i/s -    170.856k
     html4 parse 656     21.118k (±18.9%) i/s -    172.096k in   9.886192s
nokolexbor html5 parse 656
                         38.215k (±24.8%) i/s -    243.899k in   9.856369s

Comparison:
nokolexbor html5 parse 656:    38214.8 i/s
     html4 parse 656:    21118.4 i/s - 1.81x  slower
     html5 parse 656:    21029.8 i/s - 1.82x  slower

Calculating -------------------------------------
   html5 parse 70095    275.828  (±21.0%) i/s -      2.421k in   9.996074s
   html4 parse 70095    439.891  (±20.9%) i/s -      3.646k in   9.995517s
nokolexbor html5 parse 70095
                          1.467k (±18.5%) i/s -     13.797k in   9.983325s

Comparison:
nokolexbor html5 parse 70095:     1466.9 i/s
   html4 parse 70095:      439.9 i/s - 3.33x  slower
   html5 parse 70095:      275.8 i/s - 5.32x  slower

Calculating -------------------------------------
 html5 parse 1929522     12.321  (± 8.1%) i/s -    122.000  in  10.067774s
 html4 parse 1929522     36.420  (±19.2%) i/s -    351.000  in  10.018349s
nokolexbor html5 parse 1929522
                        146.070  (±15.1%) i/s -      1.423k in  10.001315s

Comparison:
nokolexbor html5 parse 1929522:      146.1 i/s
 html4 parse 1929522:       36.4 i/s - 4.01x  slower
 html5 parse 1929522:       12.3 i/s - 11.86x  slower

==========
Calculating -------------------------------------
     html5 serlz 656     39.037k (±22.6%) i/s -    335.023k in   9.824201s
     html4 serlz 656     52.522k (±21.3%) i/s -    452.027k in   9.742767s
html5 nokolexbor serlz 656
                        260.432k (±19.0%) i/s -      2.064M in   9.155473s

Comparison:
html5 nokolexbor serlz 656:   260432.1 i/s
     html4 serlz 656:    52521.9 i/s - 4.96x  slower
     html5 serlz 656:    39037.3 i/s - 6.67x  slower

Calculating -------------------------------------
   html5 serlz 70095    950.690  (±15.6%) i/s -      9.173k in   9.989867s
   html4 serlz 70095      1.049k (±15.6%) i/s -     10.090k in   9.988001s
html5 nokolexbor serlz 70095
                          3.464k (±16.9%) i/s -     32.979k in   9.976496s

Comparison:
html5 nokolexbor serlz 70095:     3464.2 i/s
   html4 serlz 70095:     1049.5 i/s - 3.30x  slower
   html5 serlz 70095:      950.7 i/s - 3.64x  slower

Calculating -------------------------------------
 html5 serlz 1929522    114.167  (± 9.6%) i/s -      1.130k in  10.002443s
 html4 serlz 1929522    112.654  (±12.4%) i/s -      1.107k in  10.006577s
html5 nokolexbor serlz 1929522
                        412.097  (±18.9%) i/s -      3.934k in   9.992725s

Comparison:
html5 nokolexbor serlz 1929522:      412.1 i/s
 html5 serlz 1929522:      114.2 i/s - 3.61x  slower
 html4 serlz 1929522:      112.7 i/s - 3.66x  slower

@flavorjones
Copy link
Member Author

@ilyazub please open a new issue.

@flavorjones
Copy link
Member Author

@ilyazub Seriously! I haven't looked hard at Nokolexbor, but there are some incompatibilities. Please open an issue if this is something you think warrants investigation and we can talk about it!

@ilyazub
Copy link
Contributor

ilyazub commented Nov 28, 2023

@flavorjones Thank you for following up! ♥️

I created an issue: #3043

@flavorjones flavorjones added this to the v1.18.0 milestone Jul 3, 2024
@flavorjones flavorjones modified the milestones: v1.18.0, v1.19.0 Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants