Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Sanitize#transform_node! #183

Merged
merged 1 commit into from
Jul 24, 2018

Conversation

stanhu
Copy link
Contributor

@stanhu stanhu commented Jul 22, 2018

Since transform_node! may be called in a tight loop to process thousands
of items, we can optimize both memory and CPU performance by:

  1. Reusing the same config hash for each transformer
  2. Directly assigning values to hash instead of using merge!. Not only does
    merge! create a new hash, it is also 2.6x slower:
    https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hash-code

Here's a sample before:

allocated memory by location
-----------------------------------
  27387360  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sanitize-4.6.5/lib/sanitize.rb:220
  11410360  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sanitize-4.6.5/lib/sanitize.rb:224
   7780320  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:840
   6848468  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:576
   6816274  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:755
   4149504  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:698
   3882432  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:684

After:

allocated memory by location
-----------------------------------
  11410360  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sanitize-4.6.5/lib/sanitize.rb:232
   7780320  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:840
   6848067  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:576
   6816274  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:755
   4149504  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:698
   3883536  /Users/stanhu/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.3/lib/nokogiri/xml/node.rb:684

Since transform_node! may be called in a tight loop to process thousands
of items, we can optimize both memory and CPU performance by:

1. Reusing the same config hash for each transformer
2. Directly assigning values to hash instead of using merge!. Not only does
merge! create a new hash, it is also 2.6x slower:
https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hash-code
@stanhu stanhu force-pushed the sh-optimize-transform-node branch from 93e5196 to 5fb40a7 Compare July 22, 2018 22:33
@stanhu
Copy link
Contributor Author

stanhu commented Jul 23, 2018

Latest benchmarks on Ruby 2.4.4:

Calculating -------------------------------------
         Hash#merge!     1.388k i/100ms
            Hash#[]=     6.160k i/100ms
-------------------------------------------------
         Hash#merge!     14.123k (± 2.5%) i/s -     70.788k
            Hash#[]=     63.559k (± 2.9%) i/s -    320.320k

Comparison:
            Hash#[]=:    63558.8 i/s
         Hash#merge!:    14123.4 i/s - 4.50x slower

@rgrove rgrove merged commit 5fb40a7 into rgrove:master Jul 24, 2018
@rgrove
Copy link
Owner

rgrove commented Jul 24, 2018

Nice! Thanks. 👍

swills pushed a commit to swills/gitlabhq that referenced this pull request Aug 1, 2018
sanitize 4.6.6 has this optimization that will benefit Markdown rendering: rgrove/sanitize#183

nokogiri 1.4.4 has this memory leak fix: sparklemotion/nokogiri#1771
jameszhan pushed a commit to jameszhan/gitaly that referenced this pull request Nov 13, 2018
sanitize 4.6.6 has this optimization that will benefit Markdown rendering: rgrove/sanitize#183

nokogiri 1.4.4 has this memory leak fix: sparklemotion/nokogiri#1771

This mirrors changes for https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20795.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants