Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(open_graph): drop cheerio and use regex #3680

Merged
merged 5 commits into from
Aug 22, 2019

Conversation

SukkaW
Copy link
Member

@SukkaW SukkaW commented Aug 19, 2019

What does it do?

cheerio is slow and we only use that to parse and list img in post.content. This PR drop cheerio and use regex to do the job.

How to test

git clone -b BRANCH https://github.com/USER/hexo.git
cd hexo
npm install
npm test

Screenshots

913f5be

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.12 6.96 6.86
regex (This PR) 7.15 6.96 6.78
dom-parser (#3679) 7.2 6.93 6.82
conditional cheerio (#3670) 7.15 6.92 6.78

4f86dde

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.03 6.59 6.51
regex (This PR) 7.01 6.57 6.45
dom-parser (#3679) 6.98 6.56 6.41
conditional cheerio (#3670) 7.00 6.60 6.38

5c309eb

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 6.96 6.65 6.85
regex (This PR) 6.90 6.68 6.73
dom-parser (#3679) 6.88 6.67 6.73
conditional cheerio (#3670) 6.72 6.60 6.80

c11ded4

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.29 6.54 6.93
regex (This PR) 7.21 6.47 6.84
dom-parser (#3679) 7.16 6.50 6.87
conditional cheerio (#3670) 7.06 6.50 6.81

Pull request tasks

  • Add test cases for the changes.
  • Passed the CI test.

@coveralls
Copy link

coveralls commented Aug 19, 2019

Coverage Status

Coverage increased (+0.0008%) to 97.155% when pulling 3e0b0dc on SukkaW:og-img-regex into 49c7d4a on hexojs:master.

@SukkaW
Copy link
Member Author

SukkaW commented Aug 19, 2019

For regex, I have test those pattern with regex101 to match this post:

/<img.*?src="(.*?)".*?\/?>/gi

6491 steps, 9 matches, ~3ms: https://regex101.com/r/N1Rv9w/1

/<img.*?(?:>|\/>)/gi

8858 steps, 9 matches, ~5ms: https://regex101.com/r/N1Rv9w/2

/<img(.*?)>/gi

6527 steps, 9 matches, ~4ms: https://regex101.com/r/N1Rv9w/3

/<img.*?>/gi

4313 steps, 9 matched, ~3ms: https://regex101.com/r/N1Rv9w/4

/<img[^>]*src[="'\s]+[^.]*\/([^.]+)\.[^"']+["']?[^>]*>/gi

4627 steps, 9 matched, ~4ms: https://regex101.com/r/N1Rv9w/5

Currently in this PR I am using the first one.

Update

After 4f86dde I am now using the forth one.

Update

/<img [^>]*src=['"]([^'"]+)([^>]*>)/gi

4573 steps, 9 matched, ~4ms: https://regex101.com/r/N1Rv9w/7

@SukkaW
Copy link
Member Author

SukkaW commented Aug 19, 2019

I think we still need to dig into those PR: #3679 #3670 and this PR. They all increase performance, but not much.

segayuu
segayuu previously approved these changes Aug 21, 2019
Copy link
Contributor

@segayuu segayuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@curbengh
Copy link
Contributor

Since this PR only uses native API, naturally it supersedes #3679 #3670. This is unless there is possibility of some edge cases, which I currently don't foresee.

curbengh
curbengh previously approved these changes Aug 21, 2019
@SukkaW
Copy link
Member Author

SukkaW commented Aug 21, 2019

@segayuu @curbengh Should I add some test cases for <img> tags with empty src attribute?

@SukkaW SukkaW dismissed stale reviews from curbengh and segayuu via 3e0b0dc August 21, 2019 09:14
@SukkaW
Copy link
Member Author

SukkaW commented Aug 22, 2019

I have update the final benchmark for this PR & #3670

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 8.75 7.0 7.82
regex (This PR) 6.51 5.52 6.44
conditional cheerio (#3670) 7.68 6.49 7.51

The last benchmark shows this PR increase performance by nearly 20%.

I will merge this PR and close #3670 then.

@SukkaW SukkaW merged commit 6f16116 into hexojs:master Aug 22, 2019
@curbengh curbengh mentioned this pull request Aug 22, 2019
4 tasks
thom4parisot pushed a commit to thom4parisot/hexo that referenced this pull request Jan 17, 2020
* refactor(open_graph): drop cheerio and use regex

* feat(open_graph): faster regex pattern for img tag

* refactor(open_graph): eliminate unnecessary checks

* feat(open_graph): a new regex which use less steps

* test(open_graph): add img with empty src attribute
curbengh pushed a commit to curbengh/curbengh.github.io that referenced this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants