-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Optimize low-level parsing details. #6890
Conversation
Measured change in parsing time to count tags in HTML5 spec: - from 1050ms to 930ms - roughly 13% faster in the worst-case document
++$at; | ||
continue; | ||
} | ||
if ( 1 !== strspn( $html, '!/?abcdefghijklmnopqrstuvwxyzABCEFGHIJKLMNOPQRSTUVWXYZ', $at + 1, 1 ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @adamziel for pointing out to me that strspn()
and strcspn()
have the $length
parameter!
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Introduces a number of micro-level optimizations in the Tag Processor to improve token-scanning performance. Should contain no functional changes. Based on benchmarking against a list of the 100 most-visited websites, these changes result in an average improvement in performance of the Tag Processor for scanning tags from between 3.5% and 7.5%. Developed in #6890 Discussed in https://core.trac.wordpress.org/ticket/61545 Follow-up to [55203]. See #61545. git-svn-id: https://develop.svn.wordpress.org/trunk@58613 602fd350-edb4-49c9-b593-d223f7449a82
Introduces a number of micro-level optimizations in the Tag Processor to improve token-scanning performance. Should contain no functional changes. Based on benchmarking against a list of the 100 most-visited websites, these changes result in an average improvement in performance of the Tag Processor for scanning tags from between 3.5% and 7.5%. Developed in WordPress/wordpress-develop#6890 Discussed in https://core.trac.wordpress.org/ticket/61545 Follow-up to [55203]. See #61545. Built from https://develop.svn.wordpress.org/trunk@58613 git-svn-id: http://core.svn.wordpress.org/trunk@58046 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Introduces a number of micro-level optimizations in the Tag Processor to improve token-scanning performance. Should contain no functional changes. Based on benchmarking against a list of the 100 most-visited websites, these changes result in an average improvement in performance of the Tag Processor for scanning tags from between 3.5% and 7.5%. Developed in WordPress/wordpress-develop#6890 Discussed in https://core.trac.wordpress.org/ticket/61545 Follow-up to [55203]. See #61545. Built from https://develop.svn.wordpress.org/trunk@58613 git-svn-id: https://core.svn.wordpress.org/trunk@58046 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Trac ticket: Core-61545
Status
Summary
Measured change in parsing time to count tags in HTML5 spec:
single-page.html
Benchmarking
Based on the
top100
set of URLs from https://github.com/ada-url/url-various-datasets I ran a script count the number of tags in each document, having previously downloaded all URLs.Of these, not all downloaded successfully and not all were HTML files.
82,406 HTML files were analyzed, representing pages from the top 100 most popular websites.
When counting all tags, trunk took between 310 seconds and 313 seconds across multiple test runs, measured from
microtime()
within the process parsing the HTML, and only measuring around thenext_token()
loop.On this branch, the counting took between 293 seconds and 300 seconds, representing around a 5% real-world improvement in token parsing speed.
For the top100 dataset, this histogram represents the relative parsing speed in MB/s for the branch against
trunk
.