Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Search 5 Indexing Performance Issue #20966

Closed
DarthFly opened this issue Feb 4, 2019 · 11 comments
Closed

Elastic Search 5 Indexing Performance Issue #20966

DarthFly opened this issue Feb 4, 2019 · 11 comments
Assignees
Labels
Component: Elasticsearch Fixed in 2.4.x The issue has been fixed in 2.4-develop branch Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release

Comments

@DarthFly
Copy link

DarthFly commented Feb 4, 2019

Preconditions (*)

  1. Magento 2.3 and Elastic Search v5 configured.
  2. Ability to debug ProductDataMapper.php file - https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/Elasticsearch/Model/Adapter/BatchDataMapper/ProductDataMapper.php

Steps to reproduce (*)

  1. Create a searchable product attribute with a lot of values. Most common - brands. We had around 300 values.
  2. Install a lot of products that use values inside this attribute. In our case it was ~180k, but for the debug you may use sample data.
  3. Run reindex bin/magento indexer:reindex catalogsearch_fulltext (or trigger it in a way to be able to debug)

Expected result (*)

  1. Index is running fine and take sane amount of time.

Actual result (*)

  1. (In our case and develop PC) with elastic 2.3 reindex takes ~40m to complete. With Elastic5 - it was complete in 8h

Issue comes from this method.

private function getValuesLabels(Attribute $attribute, array $attributeValues): array
    {
        $attributeLabels = [];
        foreach ($attribute->getOptions() as $option) {
            if (\in_array($option->getValue(), $attributeValues)) {
                $attributeLabels[] = $option->getLabel();
            }
        }
        return $attributeLabels;
    }

For each product magento run this code providing an array of attribute ids here. This is used for both multiple and single (select) attributes, so $attributeValues may look like [123, 456] (simplified, value looks a little differently), but for brands it mostly contain one value.
$attribute->getOptions() return an array with 300+ values, each is compared to the $attributeValues array to be able to retrieve a label value for this specific attribute.

Even if one search takes milliseconds (well, mostly hundreds of milliseconds) multiply this code by amount of products and you will see drop down of performance here by 5 hours.

There are 2 ways to improve this foreach fast - cache attribute values by id somewhere and validate if each value inside $attributeValues exists as key or cache values that are returned after function (will require joining array). Class is used for every product, so pre-cached value will be used correctly without recalculating it on each product.

@magento-engcom-team magento-engcom-team added the Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed label Feb 4, 2019
@magento-engcom-team
Copy link
Contributor

Hi @DarthFly. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento-engcom-team give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@DarthFly do you confirm that you was able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • yes
  • no

@ghost ghost self-assigned this Feb 4, 2019
@magento-engcom-team
Copy link
Contributor

magento-engcom-team commented Feb 4, 2019

Hi @engcom-backlog-nazar. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

  • 6. Add label Issue: Confirmed once verification is complete.

  • 7. Make sure that automatic system confirms that report has been added to the backlog.

@ghost
Copy link

ghost commented Feb 4, 2019

Hi @DarthFly thank you for your report, which version of elastic you use ?

@ghost ghost added Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Component: Elasticsearch labels Feb 4, 2019
@DarthFly
Copy link
Author

DarthFly commented Feb 4, 2019

@engcom-backlog-nazar Elastic 5 latest and the one installed on cloud. It doesn't really matter as problem is inside magento code, not when data is transferred to and from Elastic.

@ghost
Copy link

ghost commented Feb 4, 2019

@DarthFly in your case seems like not) but really which is version 5.2.x?
only 5.2.x is supported

@ghost ghost added Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed labels Feb 4, 2019
@magento-engcom-team magento-engcom-team added the Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development label Feb 4, 2019
@magento-engcom-team
Copy link
Contributor

✅ Confirmed by @engcom-backlog-nazar
Thank you for verifying the issue. Based on the provided information internal tickets MAGETWO-98054, MAGETWO-98055 were created

Issue Available: @engcom-backlog-nazar, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

@magento-engcom-team magento-engcom-team unassigned ghost Feb 4, 2019
@DarthFly
Copy link
Author

DarthFly commented Feb 4, 2019

It any case it was elasticsearch-5.2.2

@ghost
Copy link

ghost commented Feb 4, 2019

@DarthFly thanks, now clear )

@GovindaSharma GovindaSharma self-assigned this Feb 4, 2019
@magento-engcom-team
Copy link
Contributor

Hi @GovindaSharma. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

  • 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 2. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 3. Verify that the issue is reproducible on 2.2-develop branch.

    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

  • 4. If the issue is not relevant or is not reproducible any more, feel free to close it.

@jeffminor
Copy link

This is a HUGE issue for us as well. We have 2 store views with 480,xxx skus. Indexing of catalogsearch_fulltext for elasticsearch 5.2.2 went from approximately 60 minutes on 2.2.5 (enterprise edition) to 89+ hours on 2.3. We cannot go to production with this. How could this have been released (especially for enterprise customers) with this large an issue in it?

@VladimirZaets
Copy link
Contributor

Hi @DarthFly. Thank you for your report.
The issue has been fixed in #25452 by @behnamshayani in 2.4-develop branch
Related commit(s):

The fix will be available with the upcoming 2.4.0 release.

@VladimirZaets VladimirZaets added the Fixed in 2.4.x The issue has been fixed in 2.4-develop branch label Jan 8, 2020
This was referenced May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Elasticsearch Fixed in 2.4.x The issue has been fixed in 2.4-develop branch Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

No branches or pull requests

6 participants