Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of wp_navigation lookup. #36891

Merged
merged 3 commits into from
Nov 30, 2021
Merged

Conversation

spacedmonkey
Copy link
Member

@spacedmonkey spacedmonkey commented Nov 25, 2021

Description

Improve the performance of the lookup for wp_navigation posts. This improves performance in a number of way.

  • Removes search param as it can be expensive on large sites.
  • Add no_found_rows, as we don't care about the number of posts here.
  • Use WP_Query instead of get_posts should be avoided at all costs, as it is unfilterable / uncachable.
  • Do the check for has_blocks in php, as this check is cheap.
  • Query first 20, so we are more likely to find a navigation post this way.

How has this been tested?

Screenshots

Types of changes

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • I've tested my changes with keyboard and screen readers.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
  • I've updated all React Native files affected by any refactorings/renamings in this PR (please manually search all *.native.js files for terms that need renaming or removal).

@spacedmonkey spacedmonkey added [Block] Navigation Affects the Navigation Block Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta labels Nov 25, 2021
@spacedmonkey spacedmonkey self-assigned this Nov 25, 2021
Copy link
Contributor

@getdave getdave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My implementation PR originally did something similar to what you are doing here but I got the inverse feedback from @noisysocks.

I now don't know which version is safest and fastest.

In reviewing this PR I'd say it's going to be safe as it's limited to 20 posts. We might argue that this limit might mean that there's a chance no posts with blocks will be found in the set but that's highly unlikely.

Thank you for reminding me about

  • Preferring WP_Query
  • no_found_rows - 👍

I'll defer to others for a final 👍 as PHP is not my forte but this seems like a nice improvement.

@spacedmonkey
Copy link
Member Author

Preferring WP_Query

get_posts should NEVER be used, it only in core for backwards compatibility reasons. What is worse, by default it passes suppress_filters to true, meaning it can't be filtered, meaning that it can be cached and filtered. Really bad for performance. There is even a lint for it in VIP coding standards.

My implementation PR originally did something similar to what you are doing here but I got the inverse feedback

From what I can see, your original implementation was on the money. The LIKE search, is EXTREMELY expensive from a database standard point. I have worked on sites with 500k+ articles and this search would have take over 15 seconds to return. A search like this in core should be avoided at all costs, as LIKE searches like will not scale.

@adamziel
Copy link
Contributor

adamziel commented Nov 26, 2021

@spacedmonkey I'm all for using WP_Query 💯 .

About the foreach part, I fear that moving the filtering from MySQL to PHP won't actually be faster. Do you have any benchmarks to share? Intuitively, I would expect MySQL to filter by wp_navigation first using the type_status_date index, and by LIKE second – effectively spending the additional time only the navigation posts. Also, while it's true that LIKE "%query%" is pretty slow, LIKE "query%" can be really fast. I wonder if we could use that instead.

@spacedmonkey
Copy link
Member Author

@adamziel Here is some profile information.

With like search 0.0766

Screenshot 2021-11-26 at 15 48 38

Returning 20 items. 0.0432

Screenshot 2021-11-26 at 15 46 49

@adamziel
Copy link
Contributor

adamziel commented Nov 29, 2021

@spacedmonkey this is profile for just the query using double %:

  • It doesn't include filtering in PHP.
  • Updating the query to skip the leading % in LIKE would provide tremendous performance gains as it would allow MySQL to use the index.

@mtias mtias added the [Type] Performance Related to performance efforts label Nov 29, 2021
@spacedmonkey
Copy link
Member Author

@adamziel This isn't really about PHP performance, is about MySQL performance, which I have shown here to be much better. Having working on large WordPress sites for years, I know for a fact, that WordPress searches, could make databases hangs adds more processing on database servers. This compare in php is extremely cheap and almost not worth profiling.

@adamziel
Copy link
Contributor

adamziel commented Nov 29, 2021

@spacedmonkey still, I believe that switching to LIKE "prefix%" will make the query just as fast as if it used a regular = comparison and remove the need to use a PHP loop. I can provide some data to back it up later today as currently I'm working on another issue.

@adamziel
Copy link
Contributor

adamziel commented Nov 29, 2021

More data as promised:

wp_posts doesn't have an index on post_content, so LIKE can't leverage it :(

| wp_posts | CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_author` bigint(20) unsigned NOT NULL DEFAULT 0,
  `post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_excerpt` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'publish',
  `comment_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
  `ping_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
  `post_password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `post_name` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `to_ping` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `pinged` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content_filtered` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_parent` bigint(20) unsigned NOT NULL DEFAULT 0,
  `guid` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `menu_order` int(11) NOT NULL DEFAULT 0,
  `post_type` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'post',
  `post_mime_type` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `comment_count` bigint(20) NOT NULL DEFAULT 0,
  PRIMARY KEY (`ID`),
  KEY `post_name` (`post_name`(191)),
  KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
  KEY `post_parent` (`post_parent`),
  KEY `post_author` (`post_author`)
) ENGINE=InnoDB AUTO_INCREMENT=320917 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |

To measure the speed of different ways of filtering, I did a benchmark with 196044 rows in wp_posts.

All wp_navigation posts

First I tried with about 2kb of text in each row and all posts being of type wp_navigation:

No filtering by wp_content:

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                     0 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.709 sec)

Trailing % as in LIKE '<!-- wp:navi%'

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                           0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.989 sec)

Leading AND trailing % as in LIKE '%<!-- wp:navi%'

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '%<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '%<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                            0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (1.031 sec)

Both LIKE variants are pretty slow compared to a simple scan. I suppose that looking at only the first 10 characters doesn't make as much of a difference in this setup.

This is an unlikely use-case though. Why would a site have 200k navigation posts?

A 1000 wp_navigation posts

To test a more realistic scenario, I updated all but 1000 posts to have post_type=post. The results are now all similar.

No filtering by wp_content:

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                     0 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.641 sec)

Trailing % as in LIKE '<!-- wp:navi%'

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                           0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.647 sec)

Leading AND trailing % as in LIKE '%<!-- wp:navi%'

MariaDB [wordpress]> select benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '%<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 ));
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| benchmark(100000000, (SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_type = 'wp_navigation' AND post_content LIKE '%<!-- wp:navi%' AND ((wp_posts.post_status = 'publish')) LIMIT 1 )) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                            0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.657 sec)

The speed is about the same with a few ms of a difference only observable after 100 million calls. I attribute the narrowness of the difference to MariaDB optimizer choosing to filter by post_type before filtering by post_content.

Summary

In this use-case case, LIKE '%<!-- wp:%' and LIKE '<!-- wp:%' are comparable, and any slowdown is only noticeable when filtering many thousand wp_navigation posts. I can't think of a reason why any site would end up with more than a 1000 navigation posts, other than having a multisite, which would hopefully filter the posts by site first anyway – I didn't actually check that.

Both solutions would likely work just fine here. Another idea would be a subquery with LIMIT 20 and a WHERE added on top of that, but that's pretty hard to achieve with WP_Query. Either way, I'm not going to block this PR on filtering.

'order' => 'ASC',
'orderby' => 'name',
'post_status' => 'publish',
'posts_per_page' => 20, // Try the first 20 posts.
Copy link
Contributor

@adamziel adamziel Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to filter by post_content != ''. Is there a need to fetch more than 1? Do we ever expect to see a non-empty wp_navigation post that has no blocks in it? CC @getdave @noisysocks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expect it but I guess it could happen...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spacedmonkey spacedmonkey merged commit d995804 into trunk Nov 30, 2021
@spacedmonkey spacedmonkey deleted the fix/get_posts_usage branch November 30, 2021 10:24
@github-actions github-actions bot added this to the Gutenberg 12.1 milestone Nov 30, 2021
@noisysocks noisysocks removed the Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta label Dec 6, 2021
noisysocks pushed a commit that referenced this pull request Dec 6, 2021
* Use WP_Query

* Only published posts.

* Fix lint error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] Navigation Affects the Navigation Block [Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants