memory leaks #94

phrazer · 2014-09-09T22:26:07Z

when i try it with > 100kb files, it almost always gets to this message:

FATAL ERROR: JS Allocation failed - process out of memory

it also runs long time, before showing this error (few minutes). Would it be possible to process diff with multiple child processes. Every child would process part of html, use less memory and we would utilize multiple CPUs (since all servers nowadays have few processors).

Or some other memory handling, to prevent leaks? Otherwise, great code, very usefull!

eGavr · 2014-09-10T04:01:37Z

Actually, I've tried it with > 100kb files and it works.

But I suppose that your problem concerns the recursive algorithm which html differ uses.
The recursion depth is too big, so the tool fails.

In the version 1.0.0 I do not use any recursion, so I think that html-differ will work much faster and do not break at your case.

It would be great if you give me an example of a file which now fails.

eGavr · 2014-09-10T04:23:16Z

Or you can even help me and test the prototype of version 1.0.0 )

Checkout the branch support/1.0.0 (be careful! I've broken the API) and test it on your files.

phrazer · 2014-09-10T08:47:56Z

I tried to install version 1.0.0 but it always says its running 0.5 when you check with -v (version)

Also it runs long and then dies. What is the procedure to install version 1.0.0. correctly? I downloaded branch support/1.0.0 and then went to that folder and executed "npm install ."

I can send you files, can you give me your email, or something to comunicate? Best,

eGavr · 2014-09-10T08:55:59Z

Yep, version 1.0.0 is in developing mode.

You should clone the branch support/1.0.0 from github, go to the cloned folder and run

npm install

bin/html-differ --help

my email:
eugeny.gavryuschin@yandex.ru

phrazer · 2014-09-10T09:11:46Z

Hmm strange, it still doesnt work. And you email doesnt work either, it says no such user. best,

eGavr · 2014-09-10T09:15:28Z

Send here. I'll think about this problem.

job.egavr@yandex.ru

phrazer · 2014-09-10T09:55:22Z

I sended. I think it gots problems with bigger htmls because like you said recursive algorithm. But with 1.0.0 it doesnt seem to be any diffrent. Maybe we could split html to few slices, and process them separatly.

eGavr · 2014-09-10T10:29:29Z

Thank you! I'll figure it out as soon as possible!

phrazer · 2014-09-10T13:16:07Z

OK, thanks.

If I can help in anyway, please tell me. Best regards,

eGavr · 2014-09-10T13:24:21Z

I've understood why this happens.

It is so many diffs in the given HTMLs, so the work of the module jsdiff (I use it as dependence in html-differ) becomes really slow! Yes! It works, but the process of the comparison is so long that the Operating System thinks that the program fails and stop it in order not to break the work of OS in general.

I've opened an issue in jsdiff == > issue34

P.S. If you run html-differ on files > 100kb and they will have not such many diffs - the tool will not fail. (even if you use version 0.5.0)

phrazer · 2014-09-10T13:34:48Z

Hmmm,

The problem is that a lot of smaller files (>40+50 kb) also failes. If jsdiff adds maximum loop, we didnt solve the problem, because we dont show all diffs. The files I sended are real life webpages, so they should make problems.

Maybe the solution would be to split html to chunks and parallel process this chunks. Just an idea. Im more into php, so dont have very good knowledge about node.js.

Best regards,

eGavr · 2014-09-10T13:44:46Z

The problem is not in sizes of HTMLs, but in the amount of diffs between them.

How can you split the HTML? What criteria to use? If you split it , you can break it and the comparison will be not fair.

The idea of the differ was to compare not absolutely different HTMLs.

Is it necessary to compare HTMLs if they are absolutely different and you can see the diffs without any tools?

eGavr · 2014-09-10T13:50:12Z

As the variant. jsdiff adds maximum loop.

You see not all diffs, fix them or do something else what you want and run html-differ again.

eGavr · 2014-09-10T14:06:34Z

I've found the solution!
Seems, it is the best!

In current implementation of html-differ the module jsdiff compares texts by words (there are several mods: charDiff, wordDiff, lineDiff ...). If we call jsdiff in mode lineDiff, the amount of diffs becomes less, so jsDiff does not fails.

Solution:

This mode of jsdiff will be configurable in html-differ through an option, so you can choose how to compare HTMLs (by chars, byWords, byLines).

Does this variant suit you?

phrazer · 2014-09-10T14:14:31Z

Hey,

Great 👍 Is the result still the same if you compare line by line instead of word by word? Because of classes, and other things you meantioned in README?

How can I apply this patch, what file should I change to enable this, so I can test some more files? :)

eGavr · 2014-09-10T16:46:42Z

I am going to do this in version 1.0.0. There are some moments concerning this option which should be handled.

eGavr · 2014-09-10T18:36:44Z

If I will compare texts by lines, the option ignoreWhitespaces will work in another way - it will ignore all whitespaces except \ns. If more than two \ns go successive, they will turn into one \n.

For example, if you set the option which compares files by lines, ignoreWhitespaces will work in the following way:

<span>


</span>

turns to

<span>
</span>

Does this variant suit you?

phrazer · 2014-09-10T19:13:25Z

Yep, it should do it, as long as both are changed same way. Commit changes asap, and I will test with few diffrent htmls.

I had another idea, what if we first check line by line, and then if lines are the same go on, if not we also check word by word that line. Maybe then you get best from both worlds - speed and great diff tool. Just an idea, dont know if its possible to make it.

eGavr · 2014-09-10T19:28:02Z

Hmmm...

Not bad, not bad...

I'll think about it and fix it ASAP!

phrazer · 2014-09-10T20:40:25Z

Great,

Thanks for fast replay. I will prepare diffrent (sizes, tags, etc...)
html examples files and try and debug.

Best regards,

eGavr · 2014-09-10T20:44:59Z

I will do the implementation of a new option (my solution), it should solve your problem too.

And after that in minor version I 'll think about the enhancement that you have proposed, because I also need ASAP something working ) and after release of 1.0.0, I'will think about speed and quality enhancements.

Created an issue concerning your idea == > change the comparison algorithm

phrazer · 2014-09-10T20:48:21Z

Yeah, I agree with you 👍 👍

eGavr · 2014-09-11T10:04:13Z

I can not understand, why do you need differ in tasks when you can find diffs without any tools?
I tried to implement the idea of comparison by lines, but there are still so many diffs that jsdiff fails.

Also see the comments about you ideas concerning the new algoritm of comparison ==> #95

phrazer · 2014-09-11T10:23:04Z

Because I really like your tool, and it was working great. I dont understand, why you dont want it to work on bigger files too. Can you just add option to choose between lines, words as option? I tried to do it myself but could find what function is calling diff.js. Thanks,

eGavr · 2014-09-11T10:29:35Z

It is not a problem to compare texts by lines!

I've implemented this option, but it will not help you.

Because as I've already said you that even in this mode! there are so many diffs that jsdiff fails.

When we compare by words, we receive more than 50000 diffs - it is too big for jsdiff.
And even when we compare be line, we recieve more than 6000 diffs - it is very big result for jsdiff too.

phrazer · 2014-09-11T10:34:16Z

Hmm, I see. In my tests no problem to 2000 - 3000, then it stops. Could we optimise something else?

eGavr · 2014-09-11T10:42:19Z

It seems No! It is absolutely the task of jsdiff(

eGavr added this to the 1.0.0 milestone Sep 10, 2014

eGavr added bug enhancement labels Sep 10, 2014

eGavr mentioned this issue Sep 10, 2014

Create an option diffBy #96

Open

eGavr removed this from the 1.0.0 milestone Sep 11, 2014

eGavr mentioned this issue Sep 11, 2014

Change the algorithm of comparison #95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leaks #94

memory leaks #94

phrazer commented Sep 9, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 11, 2014

phrazer commented Sep 11, 2014

eGavr commented Sep 11, 2014

phrazer commented Sep 11, 2014

eGavr commented Sep 11, 2014

memory leaks #94

memory leaks #94

Comments

phrazer commented Sep 9, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 10, 2014

phrazer commented Sep 10, 2014

eGavr commented Sep 11, 2014

phrazer commented Sep 11, 2014

eGavr commented Sep 11, 2014

phrazer commented Sep 11, 2014

eGavr commented Sep 11, 2014