Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird performance behavior #370

Closed
pd4d10 opened this issue Mar 12, 2021 · 5 comments · Fixed by #371
Closed

Weird performance behavior #370

pd4d10 opened this issue Mar 12, 2021 · 5 comments · Fixed by #371

Comments

@pd4d10
Copy link

pd4d10 commented Mar 12, 2021

const Turndown = require("turndown");

console.time("2500");
new Turndown().turndown('<img src="1" />\n'.repeat(2500));
console.timeEnd("2500");

console.time("5000");
new Turndown().turndown('<img src="1" />\n'.repeat(5000));
console.timeEnd("5000");

console.time("10000");
new Turndown().turndown('<img src="1" />\n'.repeat(10000));
console.timeEnd("10000");

In my machine (Mac mini (M1, 2020)) the log is as follows:

2500: 505.093ms
5000: 1.891s
10000: 7.371s

The time spent does not seem to increase linearly. When the image tag count reaches about 10,000, it takes about 7s and caused serious performance problems.

@pavelhoral
Copy link
Collaborator

pavelhoral commented Mar 15, 2021

The root cause for this is the use of trailing whitespace regexp:

   ticks parent  name
  12183   93.6%  RegExp: \n*$
  12183  100.0%    /usr/bin/node
  12183  100.0%      /usr/bin/node
  11936   98.0%        LazyCompile: *<anonymous> /tmp/test/node_modules/turndown/lib/turndown.cjs.js:829:54
  11936  100.0%          /usr/bin/node
  11936  100.0%            LazyCompile: ~process /tmp/test/node_modules/turndown/lib/turndown.cjs.js:827:18
    210    1.7%        LazyCompile: *join /tmp/test/node_modules/turndown/lib/turndown.cjs.js:900:15
    209   99.5%          LazyCompile: ~<anonymous> /tmp/test/node_modules/turndown/lib/turndown.cjs.js:829:54
    209  100.0%            /usr/bin/nod

This regexp is called every time turndown appends new content to the overall output. And it makes sense why this is getting slower when the output gets longer - the regexp has to go through the whole output just to check few characters at the end of the string.

I tried quick and dirty patch just to test the performance impact:

image

With the following result (appending 20000 <img> elements just like in the original report):

image

I leave it to @martincizek or @domchristie to decide whether it makes sense to get rid of the regexp. I think we should replace it because just from the technical point of view it is quite ugly to match few characters at the end of potentially massive output with regexp that has to walk through every character (counterargument might be that this is not a standard use case)...

@martincizek
Copy link
Collaborator

@pavelhoral Can you please re-run your performance test with PR #371 applied?

@pavelhoral
Copy link
Collaborator

@martincizek Performance of your change is the same as with my quick patch. 👍

@martincizek
Copy link
Collaborator

@pd4d10 Now I get:

2500: 116.947ms
5000: 145.212ms
10000: 226.84ms

Until a new version is published to npm, you can test it by installing turndown as:

  "dependencies": {
    "turndown": "github:domchristie/turndown"
  }

@martincizek
Copy link
Collaborator

Until a new version is published to npm...

Fixed in 7.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants