Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEXT-34070 - Improve seo url replacer #3579

Closed
wants to merge 1 commit into from

Conversation

akf-bw
Copy link
Contributor

@akf-bw akf-bw commented Feb 22, 2024

1. Why is this change necessary?

  • In a production environment, the seo_url table can grow very quickly
  • If the seo_url table contains a large amount of data combined with extensive navigation, it may take a while for the database query to find the required data
  • Therefore it would be great if we could improve the database call

Before:
image
After:
image
(seo_url entries: ~380.000)

2. What does this change do, exactly?

  • Changed the createSeoMapping method in SeoUrlPlaceholderHandler to use subqueries

3. Describe each step to reproduce the issue or behaviour.

  • Fill the seo_url table with a lot of seo url's
  • Open the storefront

4. Please link to the relevant issues (if any).

/

5. Checklist

  • I have rebased my changes to remove merge conflicts
  • I have written tests and verified that they fail without my change
  • I have created a changelog file with all necessary information about my changes
  • I have written or adjusted the documentation according to my changes
  • This change has comments for package types, values, functions, and non-obvious lines of code
  • I have read the contribution requirements and fulfil them.

@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch from 28c5ae3 to 9016740 Compare February 22, 2024 09:46
@akf-bw akf-bw changed the title NEXT-00000 - Improved seo url replacer NEXT-00000 - Improve seo url replacer Feb 22, 2024
@akf-bw akf-bw marked this pull request as draft February 22, 2024 09:58
@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch from 9016740 to b05d613 Compare February 22, 2024 13:20
@akf-bw akf-bw marked this pull request as ready for review February 22, 2024 13:20
@akf-bw akf-bw requested a review from shyim February 22, 2024 13:32
@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch 3 times, most recently from c9666ff to 08055ba Compare February 22, 2024 14:03
@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch from 08055ba to 924d190 Compare February 26, 2024 07:01
@vienthuong
Copy link
Contributor

Hi @akf-bw
Thanks for your contribution. The solution looks very good so far. I have a general question:
What could the overhead be in the original query? the path_info is already indexed. My guess is the problem might be there's no limit set for the original query? Could you try to set the limit for the two queries:

  1. LIMIT = count($mapping) when sales_channel_id = :contextSalesChannelId
  2. LIMIT = count($missing) when sales_channel IS null and $missing is not empty (as in your approach)

@akf-bw
Copy link
Contributor Author

akf-bw commented Feb 26, 2024

Hi @akf-bw Thanks for your contribution. The solution looks very good so far. I have a general question: What could the overhead be in the original query? the path_info is already indexed. My guess is the problem might be there's no limit set for the original query?

There was no limit in the original query. However, it was also not possible to add a limit in the original query because that query retrieved filled salesChannelId's and empty salesChannelIds at the same time, so there are multiple lines for a path_info.
Since we only need a single line for “path_info” in the next foreach, it is much better to use the two queries instead.

In my local test environment, the original query returned a list of approximately 110,000 rows per request.
The new two queries only return rows for the maximum size of the $missing array.

Could you try to set the limit for the two queries:

  1. LIMIT = count($mapping) when sales_channel_id = :contextSalesChannelId
  2. LIMIT = count($missing) when sales_channel IS null and $missing is not empty (as in your approach)

I don't need to add an additional limit to the new queries because the $mapping variable already defines how many rows are possible.

@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch from 924d190 to f7d4e2d Compare February 26, 2024 15:28
@vienthuong
Copy link
Contributor

vienthuong commented Feb 27, 2024

110,000 rows per request

Yeah I mean to set the limit in the original query and call it twice with different salesChannelId parameters (similar to your solution but we don't need the subqueries). But your sub-query solution would also be good.

Could you add the unit test to cover your changes?

@akf-bw
Copy link
Contributor Author

akf-bw commented Feb 27, 2024

Yeah I mean to set the limit in the original query and call it twice with different salesChannelId parameters (similar to your solution but we don't need the subqueries). But your sub-query solution would also be good

I already tested this. For this to work you also need to add "GROUP BY path_info" so that the limit can work correctly. This grouping is extremely slow and took about twice as long as the old query, so this wasn't a solution.

Could you add the unit test to cover your changes?

I'll take a look at this.

@akf-bw
Copy link
Contributor Author

akf-bw commented Feb 27, 2024

@vienthuong There is already the testSeoReplacementSalesChannelDefaultAndOverride method in SeoUrlPlaceholderHandlerTest which should cover the full behavior of the changed method.
In addition, there are already many other test classes that work with the SeoUrlPlaceholderHandler and all pass in the unit tests.

@akf-bw
Copy link
Contributor Author

akf-bw commented Feb 27, 2024

@vienthuong I again tested with the query params. If we include the params directly like this, we could save up to 1-5ms per createSeoMapping call:
image
Should I change the method to this?

@vienthuong
Copy link
Contributor

@vienthuong I again tested with the query params. If we include the params directly like this, we could save up to 1-5ms per createSeoMapping call: image Should I change the method to this?

In this particular case, the performance difference is not significant and using Uuid::fromHexToBytes would be safer. We will also evaluate it if it would be a good idea and worth the effort to use concatted string instead of the hex2bin method as we have used it everywhere

@akf-bw akf-bw force-pushed the seo-url-replacer-improvement branch from f7d4e2d to 6f85ed5 Compare February 27, 2024 11:38
@akf-bw
Copy link
Contributor Author

akf-bw commented Feb 27, 2024

@vienthuong I just removed one of the foreach loops since we can move the behavior into the loop beforehand.

@vienthuong
Copy link
Contributor

Hi @akf-bw, looks very good so far.
One last thing since I'm curious about your concatted-string solution vs Uuid::fromHexToBytes, could you do a little benchmark with your particular case with your 380k (or more) seo_urls?

@ghost
Copy link

ghost commented Feb 28, 2024

Hello,

thank you for creating this pull request.
I have opened an issue on our Issue Tracker for you. See the issue link: https://issues.shopware.com/issues/NEXT-34070

Please use this issue to track the state of your pull request.

@vienthuong
Copy link
Contributor

We did evaluate the performance gains by using/not using Uuid::fromHexToBytes and it's insignificant unless you have millions of UUIDs. So for now we will keep using the method to have safer/standard UUIDs in the query.

@akf-bw akf-bw changed the title NEXT-00000 - Improve seo url replacer NEXT-34070 - Improve seo url replacer Feb 29, 2024
@akf-bw akf-bw deleted the seo-url-replacer-improvement branch March 6, 2024 06:27
@tamvt
Copy link
Contributor

tamvt commented Mar 6, 2024

Your PR has been merged. Thank you for your contribution 🎉 💙

@tamvt tamvt added Accepted and removed Scheduled labels Mar 6, 2024
@En0Ma1259
Copy link
Member

I've tried this change and now the query is slower ...

Mappings:
20: ~2ms ⇾ ~6ms
~100: ~10ms ⇾ ~45ms
~600: ~40ms ⇾ ~260ms

Shopware: 6.5.8.6
PHP: 8.2
MySQL: 8.0.35
Profiler: Blackfire
seo_url entry count: ~200,000

@vienthuong
Copy link
Contributor

Hi @akf-bw could you also benchmark your instance with the latest source code?

shopwareBot pushed a commit that referenced this pull request Mar 6, 2024
@akf-bw
Copy link
Contributor Author

akf-bw commented Mar 7, 2024

@tamvt @vienthuong Could you please revert the PR changes in the trunk branch for now. I'll have to check what caused this different behavior in my environment.
As soon as I know more and have found a better/different optimization I will open a new PR.

shopwareBot pushed a commit that referenced this pull request Mar 11, 2024
@mitelg mitelg added the pr/accepted Shopware accepted the PR. label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/accepted Shopware accepted the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants