Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Data of Hexo #3663

Closed
SukkaW opened this issue Aug 12, 2019 · 25 comments
Closed

Performance Data of Hexo #3663

SukkaW opened this issue Aug 12, 2019 · 25 comments

Comments

@SukkaW
Copy link
Member

SukkaW commented Aug 12, 2019

Latest Benchmark

It is the current baseline of Hexo

(Travis CI, Ubuntu 16, Node.js 10)

Hexo 3.2 4.0 c48f6a8
Load Plugin/Scripts/Database (Cold) 0.840s 1.03s 0.47s
Process Source (Cold) 12.18s 8.21s 8.18s
Render Files (Cold) 13.13s 9.50s 9.46s
Save Database (Cold) 0.68s 0.701s 0.68s
Total time (Cold) 26.83s 19.43s 18.80s
Memory Usage (Cold) 880.67MB 548.24MB 555.36MB
Load Plugin/Scripts/Database (Hot) 2.49s 2.00s 1.743s
Process Source (Hot) 0.78s 0.73s 0.86s
Render Files (Hot) 9.95s 8.47s 8.82s
Save Database (Hot) 0.68s 0.66s 0.68s
Total time (Hot) 13.90s 11.85s 11.08s
Memory Usage (Hot) 692.35MB 785.77MB 788.05MB

Past Benchmark

I have ran a performance test for Hexo using Travis CI with 300 posts(All posts have 1 category and 5 tags, and all markdown styles and large amounts of code blocks are included in each .md file)

All the css, js & images are deleted before hexo clean && hexo g to make sure every file generated is rendered HTML.

NodeJS 8 NodeJS 10 NodeJS 12
Enable Fragment Cache & Disable Highlight.js 26.5s 21s 21.5s
Disable Fragment Cache 35s 29s 29.5s
Enable highlight.js only 35s 27s 27.5s
Enable highlight.js & line_number 38s 30s 30s
Enable highlight.js & tab_replace 34.5s 27s 28s
Enable highlight.js, tab_replace & line_number 38s 30s 31s

Raw log can be seen here.

It seems that Disable Fragment Cache will make hexo 35% slower, Enable highlight.js will make hexo 45% slower.
And even with Fragment enabled and Highlight.js disabled, hexo can only render 33 HTML per second.


@tcrowe
Copy link
Contributor

tcrowe commented Aug 14, 2019

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@tcrowe Fragment Cache was introduced in #637 and the feature is depends on theme.
And Fragment Cache will be much more helpful if some time-consuming partial has been cached, see #1769

@ppoffice
Copy link
Contributor

ppoffice commented Aug 15, 2019

@SukkaW Please comment out this line, rerun your tests and see how results are

https://github.com/hexojs/hexo/blob/master/lib/plugins/filter/index.js#L14

spoiler alert cheerio and how Hexo generator spawn promises are responsible for the sluggish HTML generation and huge memory leak.

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@ppoffice Yeah, I do have received a notice that meta_generator has contributed to poor performance. I will add a test with meta_generator disabled and see what we can get.

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@ppoffice The result is out:

image

This is baseline

image

Set meta_generator: false in _config.yml

image

Comment out the filter.register('after_render:html', require('./meta_generator'));

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@ppoffice
Copy link
Contributor

ppoffice commented Aug 15, 2019

@SukkaW meta_generator config was introduced 9 days ago. You probably need to update your Hexo in your test to make the meta_generator: false work.

#3653

Edit
I think the cheerio.load part takes most of the time. So the config.meta_generator should be put here https://github.com/hexojs/hexo/blob/master/lib/plugins/filter/index.js#L14.
@curbengh @yoshinorin

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@jiangtj
Copy link
Member

jiangtj commented Aug 15, 2019

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

https://github.com/hexojs/hexo/blob/master/lib/plugins/filter/after_post_render/external_link.js#L8
Or like this, return before cheerio.load

I have created a PR about this.

Anyway, I totally disagree to use cheerio traversing all the HTML generated, just for add an meta_generator tag. There should be some better ways to do that.

@curbengh
Copy link
Contributor

curbengh commented Aug 15, 2019

Anyway, I totally disagree to use cheerio traversing all the HTML generated, just for add an meta_generator tag. There should be some better ways to do that.

open_graph approach?

This also means (with that approach) meta_generator is no longer widely adopted (across Hexo), as it depends how soon a theme can adopt it and how often users update their theme. I believe themes will adopt it eventually, albeit will take some time to reach users.

Ultimately, it depends how much we care about Wappalyzer statistics. Judging by the performance benchmark, maybe it's not worth it.


I'm very interested if commenting out the <%- open_graph() %> from theme can improve performance.


I'm preparing a PR to move meta_generator from filter to helper.

Edit: #3669


Another (less breaking) approach by using string.replace() instead

#3671

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@curbengh Currently the theme I used in performance test, hexo-theme-suka, generates Open Graph manually without using Hexo helper: https://github.com/SukkaW/hexo-theme-suka/blob/master/layout/_partial/head/open-graph.ejs

Maybe I would bring up a test with hexo-theme-landscape to test the performance of open-graph helper.

Also, cheerio is very slow... Especially when it is needed to let cheerio ran through all the hundreds of, maybe thousands of HTML files generated.
We should come up with a better idea to detect and add meta[generator] tag.

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@curbengh

I just bring up the benchmark for open_graph() helper.

Raw test log can be found here: https://travis-ci.com/theme-suka/performance-test/builds/123338750

Teat A

with hexo-theme-landscape and meta_generator set to false.

image

Teat B

with hexo-theme-landscape and meta_generator set to false, and remove open_graph() helper using:

sed -i "s|<%- open_graph({twitter_id: theme.twitter, fb_admins: theme.fb_admins, fb_app_id: theme.fb_app_id}) %>||g" themes/landscape/layout/_partial/head.ejs

image

It is 28% faster without open_graph() helper.

Update

open_graph() helper detect images in the posts by default, and it is the only usage of cheerio in open_graph() helper.

@curbengh
Copy link
Contributor

curbengh commented Aug 15, 2019

I wonder if cheerio@1.0.0rc.3 is faster. It's currently unusable for non-ASCII chars though, but it shouldn't affect the benchmarking if we disregard how the rendered file looks like.

If you need the rendered file to be valid, you can use "cheerio": "curbengh/cheerio#decode-test" which includes the fix. you may also need to enable decodeEntities. (Edit: decodeEntities is no longer relevant.)

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

@curbengh
For open_graph() helper I have an idea. Since cheerio is only used to list all the images in the post at open_graph(), we could use indexOf('<img') first to determine if the current content has image or not.
If the current post has no images, then cheerio is no need to load the whole post.

Edit

During the test case, although there are 300 posts used in the benchmark, but none of them includes images, which means even without images in the content, cheerio might still slow down the speed.

@curbengh
Copy link
Contributor

curbengh commented Aug 15, 2019

Since cheerio is only used to list all the images in the post at open_graph(), we could use indexOf('<img') first to determine if the current content has image or not.
If the current post has no images, then cheerio is no need to load the whole post.

I like the idea, though ES6 includes(substring) is preferred. Looking forward for your PR.

Edit: @SukkaW created #3670

which means even without images in the content, cheerio might still slow down the speed.

your proposed fix is supposed to fix it, yes?

@SukkaW
Copy link
Member Author

SukkaW commented Aug 15, 2019

A benchmark of load cheerio when needed:

Teat A

with hexo-theme-landscape and meta_generator set to false, and cheer io is always required.

image

Teat B

with hexo-theme-landscape and meta_generator set to false, and only require cheerio when content includes <img.

  if (!images.length && content) {
    images = images.slice();

    if (content.includes('<img')) {
      if (!cheerio) cheerio = require('cheerio');

      const $ = cheerio.load(content);

      $('img').each(function() {
        const src = $(this).attr('src');
        if (src) images.push(src);
      });
    }
  }

image

As those 300 posts includes no images, generate speed is now 17% faster. I will open a PR then.

Updates

I have updated hexo-many-posts and now 200 posts of all 300 will include images.

Then

Test A

image

Test B

image

Now it is only 4% faster.

@tcrowe
Copy link
Contributor

tcrowe commented Aug 15, 2019

@ppoffice Is it possible you can show me an example of this? how Hexo generator spawn promises

Also we hardly need cheerio for simple dom operations. Possibly we can just use string templates for the simple cases.

@ppoffice
Copy link
Contributor

@tcrowe #3665

@SukkaW
Copy link
Member Author

SukkaW commented Aug 16, 2019

@tcrowe I have listed the helpers & filters that use cheerio. Maybe we can replace them with regex.
#3663 (comment)

@SukkaW
Copy link
Member Author

SukkaW commented Aug 16, 2019

I'm also interested in the performance impact of external_link filter, so I bring up a benchmark.

Test A

Baseline, external_link is set to true.

image

Test B

external_link is set to false.

image

I can see only a little improvements on generating speed, less than 0.5%.

It might because external_link filter is registered at after_post_render, which means cheerio only needs to load post content. And as meta_generator filter registered at after_render:html, cheerio needs to load all the HTML files generated (usually double to three times the number of posts due to tags and categories), that's why it caused a serious performance impact.

@curbengh
Copy link
Contributor

It might because external_link filter is registered at after_post_render, which means cheerio only needs to load post content. And as meta_generator filter registered at after_render:html, cheerio needs to load all the HTML files generated (usually double to three times the number of posts due to tags and categories), that's why it caused a serious performance impact.

I just found after_post_render only works on post, not pages. So, it is appropriate for meta_generator to be after_render:html. external_link filter should also be after_render:html as well.

@SukkaW
Copy link
Member Author

SukkaW commented Aug 18, 2019

I have noticed that during those benchmarks, my hexo-theme-suka is always 10% slower than hexo-theme-landscape. And I have found that hexo-theme-suka uses toc() helper and hexo-theme-landscape doesn't. So I bring up a new test with hexo-theme-suka:

Test A

Baseline, with meta_generator commented out.

image

Test B

Disable the toc.

image

Bingo! toc() helper slowed down the generation by 24.6%.

Then I looked at the toc() helper and have some interesting discovery.

const $ = cheerio.load(str);
const headingsMaxDepth = Object.prototype.hasOwnProperty.call(options, 'max_depth') ? options.max_depth : 6;
const headingsSelector = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'].slice(0, headingsMaxDepth).join(',');
const headings = $(headingsSelector);
if (!headings.length) return '';

Most of the theme use toc() helper like this:

toc(page.content)

which means when a theme is using toc() helper, the cheerio will load all posts' content. In the benchmarks, toc() helper will load all 300 posts' content.

@curbengh
Copy link
Contributor

curbengh commented Aug 18, 2019

I created a separate issue (#3677) to track cheerio-related plugins. Maybe we could also create a separate issue to find out improper uses of Promise.


Replacement for highlight.js is already discussed in #1036 and it's part of the roadmap.

@stale
Copy link

stale bot commented Oct 17, 2019

This issue has been automatically marked as stale because lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants