Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent memory leak when duping DocumentFragment #1063

Closed
bryanp opened this issue Mar 7, 2014 · 6 comments
Closed

Apparent memory leak when duping DocumentFragment #1063

bryanp opened this issue Mar 7, 2014 · 6 comments
Assignees
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Milestone

Comments

@bryanp
Copy link

bryanp commented Mar 7, 2014

Ruby 2.1, Nokogiri 1.6.1

Title explains the issue I'm seeing. Here's a simple case to reproduce:

puts `ps -o rss= -p #{$$}`.to_i
doc = Nokogiri::HTML.fragment('<div>foo<span>bar</span></div>')
10000.times do
  doc.dup
  GC.start
end
puts `ps -o rss= -p #{$$}`.to_i

Memory usage goes from 11.9KB to 18.7KB. The contents of the fragment do not seem to matter but the bigger the fragment the bigger the leak (ran it with a large fragment and it peaked at > 1GB).

Having trouble with valgrind on OSX so unable to provide that analysis. It should be clear there's a problem if you run it. Compare to this example, which does not continue to eat up memory (from 11.12KB to 11.16KB):

doc = Nokogiri::HTML::Document.parse('<div>foo<span>bar</span></div>')
puts `ps -o rss= -p #{$$}`.to_i
10000.times do
  doc.dup
  GC.start
end
puts `ps -o rss= -p #{$$}`.to_i

Happy to provide more info if necessary.

@wpaulson
Copy link

wpaulson commented May 6, 2014

Also seen on ruby 1.9.3p448 and nokogiri 1.6.0.
First program (fragment() version) prints
18220
18472

Second program (calls parse()) prints
18400
18420

Valgrind results:
Valgrind run on the programs above (valgrind --partial-loads-ok=yes --undef-value-errors=no ruby …) shows the following (Linux version 2.6.18-348.4.1.el5 RedHat)
First program above (calls fragment()):
==27741== Memcheck, a memory error detector
==27741== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==27741== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==27741== Command: ruby test.rb
==27741==
87136
91444
==27741==
==27741== HEAP SUMMARY:
==27741== in use at exit: 2,438,775 bytes in 64,678 blocks
==27741== total heap usage: 331,391 allocs, 266,713 frees, 55,918,664 bytes allocated
==27741==
==27741== LEAK SUMMARY:
==27741== definitely lost: 929,962 bytes in 12,530 blocks
==27741== indirectly lost: 946,631 bytes in 32,077 blocks
==27741== possibly lost: 179,997 bytes in 5,134 blocks
==27741== still reachable: 382,185 bytes in 14,937 blocks
==27741== suppressed: 0 bytes in 0 blocks
==27741== Rerun with --leak-check=full to see details of leaked memory
==27741==
==27741== For counts of detected and suppressed errors, rerun with: -v
==27741== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Successful program (calls parse()):
bundle exec valgrind --partial-loads-ok=yes --undef-value-errors=no ruby test.rb
==27587== Memcheck, a memory error detector
==27587== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==27587== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==27587== Command: ruby test.rb
==27587==
87504
90124
==27587==
==27587== HEAP SUMMARY:
==27587== in use at exit: 2,390,531 bytes in 64,672 blocks
==27587== total heap usage: 481,268 allocs, 416,596 frees, 71,213,180 bytes allocated
==27587==
==27587== LEAK SUMMARY:
==27587== definitely lost: 919,728 bytes in 12,530 blocks
==27587== indirectly lost: 949,732 bytes in 32,193 blocks
==27587== possibly lost: 155,914 bytes in 5,711 blocks
==27587== still reachable: 365,157 bytes in 14,238 blocks
==27587== suppressed: 0 bytes in 0 blocks
==27587== Rerun with --leak-check=full to see details of leaked memory
==27587==
==27587== For counts of detected and suppressed errors, rerun with: -v
==27587== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@flavorjones flavorjones self-assigned this Jan 27, 2017
@flavorjones flavorjones added this to the 1.7.1 milestone Jan 27, 2017
@flavorjones
Copy link
Member

Some notes:

  • verified this leak with Nokogiri 1.7.0.1 using libxml 2.9.4 with patches
  • leak does not occur when re-parsing fragments in a loop
  • leak does not occur when duping a full Document
  • leak does occur when duping a DocumentFragment

@flavorjones
Copy link
Member

OK, this is because DocumentFragment#dup inherits from Node, which is copying the node subtree and rooting it in the doc. These unparented nodes just hang around for the lifetime of the doc.

Suggestion: we could implement DocumentFragment#dup to copy the DocumentFragment's parent Document, and thus treat it more like a full-blown Document (which I think is what people generally expect).

@flavorjones flavorjones modified the milestones: 1.8.1, 1.8.0 Jun 5, 2017
@flavorjones flavorjones modified the milestones: 1.8.1, 1.8.2 Sep 19, 2017
flavorjones added a commit that referenced this issue Jan 29, 2018
@flavorjones flavorjones modified the milestones: 1.8.2, 1.8.3 Jan 29, 2018
@flavorjones
Copy link
Member

I've addressed this in MRI by copying nodes directly to the new Document. However, I'm having trouble with the java implementation. WIP is on branch 1063-flavorjones-dup-document-fragment.

Moved this into the 1.8.3 milestone, sorry for the delay.

@flavorjones flavorjones modified the milestones: 1.8.3, 2.0.0 Mar 23, 2018
@flavorjones flavorjones modified the milestones: 1.8.3, 1.8.4, 1.8.5 Jun 30, 2018
@flavorjones flavorjones modified the milestones: 1.8.5, next Nov 8, 2018
flavorjones added a commit that referenced this issue Dec 4, 2018
@flavorjones
Copy link
Member

OK - I've got a second branch up in which I punt on trying to implement this optimization in the JRuby implementaiton. branch is 1063-flavorjones-dup-document-fragment-attempt-2 and I'm going to poke @jvshahid about this decision in the morning.

@flavorjones
Copy link
Member

Chatted with @jvshahid and he agrees we should just address this issue in the libxml2/MRI code base and leave JRuby alone, at least for now. I've created #1832 to track this work for JRuby should eventually decide to do it.

Will ship a PR based on the above mentioned branch.

flavorjones added a commit that referenced this issue Dec 10, 2018
flavorjones added a commit that referenced this issue Dec 10, 2018
flavorjones added a commit that referenced this issue Dec 10, 2018
flavorjones added a commit that referenced this issue Dec 10, 2018
…ment-fragment-attempt-2

fix #1063: poor memory performance when `dup`ing DocumentFragment
flavorjones added a commit that referenced this issue Dec 18, 2018
... to only running when testing the libxml2 impl.

Related to #1846 and #1063.
@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Feb 2, 2020
flavorjones added a commit that referenced this issue Dec 12, 2024
Back in b92660e (#1834 fixing #1063) I omitted support in JRuby for
the "new_parent_document" argument to `Node#dup` because there was no
performance reason to implement it. So the test was skipped.

However, in 1e7d38a and other commits in #3117 (fixing #316), I
introduced a call to `initialize_copy_with_args` that passes the new
parent document as an argument on both CRuby and JRuby
implementations. Because the test was skipped, I didn't catch that
this broke on JRuby.

In particular this was a problem for Loofah which relies on
decorators, and even more particularly this broke the
`Loofah::TextBehavior` formatting concern for
`Loofah::*::DocumentFragment` objects.

Maybe we should be running downstream tests with JRuby, too? But that
feels like a big investment right now so I'll avoid scarring on the
first cut, and wait to see if it happens again.
flavorjones added a commit that referenced this issue Dec 12, 2024
Back in b92660e (#1834 fixing #1063) I omitted support in JRuby for
the "new_parent_document" argument to `Node#dup` because there was no
performance reason to implement it. So the test was skipped.

However, in 1e7d38a and other commits in #3117 (fixing #316), I
introduced a call to `initialize_copy_with_args` that passes the new
parent document as an argument on both CRuby and JRuby
implementations. Because the test was skipped, I didn't catch that
this broke on JRuby.

In particular this was a problem for Loofah which relies on
decorators, and even more particularly this broke the
`Loofah::TextBehavior` formatting concern for
`Loofah::*::DocumentFragment` objects.

Maybe we should be running downstream tests with JRuby, too? But that
feels like a big investment right now so I'll avoid scarring on the
first cut, and wait to see if it happens again.

(cherry picked from commit dda0be2)
flavorjones added a commit that referenced this issue Dec 12, 2024
**What problem is this PR intended to solve?**

Back in b92660e (#1834 fixing #1063) I omitted support in JRuby for the
"new_parent_document" argument to `Node#dup` because there was no
performance reason to implement it. So the test was skipped.

However, in 1e7d38a and other commits in #3117 (fixing #316), I
introduced a call to `initialize_copy_with_args` that passes the new
parent document as an argument on both CRuby and JRuby implementations.
Because the test was skipped, I didn't catch that this broke on JRuby.

In particular this was a problem for Loofah which relies on decorators,
and even more particularly this broke the `Loofah::TextBehavior`
formatting concern for `Loofah::*::DocumentFragment` objects.


**Have you included adequate test coverage?**

The skipped test is no longer skipped!

Maybe we should be running downstream tests with JRuby, too? But that
feels like a big investment right now so I'll avoid scarring on the
first cut, and wait to see if it happens again.


**Does this change affect the behavior of either the C or the Java
implementations?**

Brings the Java impl into agreement with the C impl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

3 participants