perf(Elasticsearch): optimize memory consumption during reindex. #2327

monrostar · 2023-07-31T18:51:35Z

Added two new variables to limit the amount of data that can be stored in memory during the reindex job

Products chunk size for each loop iteration when indexing products.

reindexProductsChunkSize - default value is 2500

Index operations are performed in bulk, with each bulk operation containing a number of individual index operations. This option sets the maximum number of operations in the memory buffer before a bulk operation is executed.

reindexBulkOperationSizeLimit - default value is 3000

Refs: #1770

BREAKING CHANGE: This update optimizes memory usage for large amounts of data and applies to products with many variants. With the help of the memory storage limit, memory consumption can be increased or decreased. The processing speed and memory consumption depends on the limits set in the new variables.
And batchSize was removed because it was a variable that was not used elsewhere

…eindex. > Products limit chunk size for each loop iteration when indexing products. `reindexProductsChunkSize` - default value is 500 > Index operations are performed in bulk, with each bulk operation containing a number of individualindex operations. This option sets the maximum number of operations in the memory buffer before abulk operation is executed. `reindexBulkOperationSizeLimit` - default value is 3000 Refs: vendure-ecommerce#1770

netlify · 2023-07-31T18:51:39Z

❌ Deploy Preview for effervescent-donut-4977b2 failed.

Name	Link
🔨 Latest commit	`77c10f2`
🔍 Latest deploy log	https://app.netlify.com/sites/effervescent-donut-4977b2/deploys/65b932bcd5e3b00009afe5c6

…Size

monrostar · 2023-08-01T07:14:32Z

@michaelbromley
I think that such a loop can be added wherever a list is used to generate operations

if (operations.length >= this.options.reindexBulkOperationSizeLimit)

For example. Here we can have the same problem and we can fix it if we'll use limits for loops

    private async updateProductsInternal(ctx: MutableRequestContext, productIds: ID[]) {
        const operations = await this.updateProductsOperations(ctx, productIds); // potential problem with big productIds list
        await this.executeBulkOperations(operations);
    }

michaelbromley · 2023-08-04T08:14:28Z

@monrostar thanks for the work on this so far! Regarding the last comment - do you plan to implement that additional optimization in this PR?

monrostar · 2023-08-06T11:21:26Z

@monrostar thanks for the work on this so far! Regarding the last comment - do you plan to implement that additional optimization in this PR?

@michaelbromley
Yes, I checked the generation during reindex and instead of 50 hours it took 200minutes to generate 18249972 documents.
I will try to make this update as soon as possible

…index and other operations

monrostar · 2023-08-06T12:28:58Z

@michaelbromley Added optimization for all related operations

michaelbromley · 2023-08-07T10:54:44Z

packages/elasticsearch-plugin/src/indexing/indexer.controller.ts

-
-const REINDEX_CHUNK_SIZE = 2500;
-const REINDEX_OPERATION_CHUNK_SIZE = 3000;
+import { asyncObservable } from "@vendure/core/src";


This import is causing issues due to importing from /src. Add the asyncObservable import back into the group of @vendure/core imports and hopefully this resolves the failing tests.

Yes, didn't notice the error, I already fixed

monrostar · 2023-08-08T11:05:55Z

@michaelbromley I need to fix some code, probably I missing something when coping with our code. I will fix e2e tomorrow

michaelbromley · 2023-11-23T10:44:59Z

@monrostar Did you ever get a chance to finish this? Do you need any help?

monrostar · 2023-11-23T10:51:22Z

Yes, I need help, I had a problem starting the project and I was never able to get back to fix this problem. I'll describe my problem in more detail later. I'm still in touch, just a lot of work

giosueDelgado · 2023-11-23T15:33:01Z

We are having issue on reindex! If you can finish we can test ASAP @monrostar and give you a fast feedback

monrostar · 2023-11-23T15:34:26Z

We are having issue on reindex! If you can finish we can test ASAP @monrostar and give you a fast feedback

Ok, I will try to edit the PR within 2 days

giosueDelgado · 2023-11-23T17:52:24Z

FYI @LorenzoRottigni

giosueDelgado · 2023-11-23T17:52:53Z

We are having issue on reindex! If you can finish we can test ASAP @monrostar and give you a fast feedback

Ok, I will try to edit the PR within 2 days

Cool! Thanks if you need help or some support tell us!

monrostar · 2023-11-26T17:56:53Z

We are having issue on reindex! If you can finish we can test ASAP @monrostar and give you a fast feedback

Ok, I will try to edit the PR within 2 days

Cool! Thanks if you need help or some support tell us!

Sorry, I wasn't able to work on PR this week. Unfortunately, I'm sick. I just don't have enough time

I'm gonna need your help to finish this. This code I sent is basically the same code we are using right now in our project, we just need to find a bug because I transferred this code without testing and we have Eslint set up differently.

In general, it is absolutely working code, we only need to debug because operations are split into separate queries to save memory and CPU.
If you have time, please try to test this code, otherwise, I won't be able to make the changes until a couple of days later once I'm feeling better

giosueDelgado · 2023-11-27T22:06:38Z

Ok, no problem, I can start to check this Wednesday, I give you updates

giosueDelgado · 2023-11-30T10:06:48Z

@monrostar Me and @LorenzoRottigni try to check this today we give you updates as soon as we have!

giosueDelgado · 2023-12-03T12:26:00Z

@monrostar I found a bug in the query you are using if there are relation on customfields.
I cannot push on your repository to make it work, in the meantime I create a copy in my local and make it works.
For my is the first time i'm working in a pr from another person, how I can contribute in this case?

In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch based on the custom data of the installation what do you think? @michaelbromley

michaelbromley · 2023-12-04T09:19:41Z

I'm not totally sure whether you can directly contribute to @monrostar's PR - maybe you can if he gives you write access to his fork.

In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch

Maybe this can be a separate issue + PR?

giosueDelgado · 2023-12-04T09:39:13Z

I'm not totally sure whether you can directly contribute to @monrostar's PR - maybe you can if he gives you write access to his fork.

In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch

Maybe this can be a separate issue + PR?

Yes, in my local I just did all things, in the @monrostar pr I will just fix the query as soon as he can give to me the permission, after that I open a new issue + pr for allow custom query based on the specific needs.

I just develop all changes and also add some logging and some general improvements on the elasticsearch plugin I hope to contribute as soon as possible because it take a very important optimization and possibility to customize! :)

monrostar · 2023-12-04T23:16:06Z

I'm not totally sure whether you can directly contribute to @monrostar's PR - maybe you can if he gives you write access to his fork.

In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch

Maybe this can be a separate issue + PR?

Yes, in my local I just did all things, in the @monrostar pr I will just fix the query as soon as he can give to me the permission, after that I open a new issue + pr for allow custom query based on the specific needs.

I just develop all changes and also add some logging and some general improvements on the elasticsearch plugin I hope to contribute as soon as possible because it take a very important optimization and possibility to customize! :)

I can give you all access, this is not an issue if you already done this :)
I appreciate your input on this PR, thank you so much for helping out. I already sent invite

monrostar · 2023-12-04T23:21:21Z

I'm not totally sure whether you can directly contribute to @monrostar's PR - maybe you can if he gives you write access to his fork.

In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch

Maybe this can be a separate issue + PR?

In ES, we need to rewrite the index completely because mapping is not good for large amounts of data. It is necessary to rewrite the structure and also to split Product/ProductVariant into different indexes because they are responsible for different search levels and aggregation over such a large amount of data is a very costly procedure.

We can take a more sophisticated approach. We are working on this improvement now, because we couldn't handle an index size of 200Million documents

giosueDelgado · 2023-12-05T07:51:21Z

Agree with you @monrostar but as you know the elasticsearch is not open source any more so I think (for our case) is better migrate to other solution opensource.
For now because the plugin was ready to use we start with elasticsearch.
If we need to create a plugin more performance I prefer to do with some opensource project what do you think?

I think we can build some plugin based on typesense, Solr, MeiliSearch, (or the opensource fork of elasticsearch "OpenSearch") let me know what you think

monrostar · 2023-12-05T08:53:10Z

Agree with you @monrostar but as you know the elasticsearch is not open source any more so I think (for our case) is better migrate to other solution opensource. For now because the plugin was ready to use we start with elasticsearch. If we need to create a plugin more performance I prefer to do with some opensource project what do you think?

I think we can build some plugin based on typesense, Solr, MeiliSearch (or the opensource fork of elasticsearch "OpenSearch") let me know what you think

Such solutions depend on your business objectives and capabilities, so this choice does not always have to be made because it is opensource or not. We use ES in production on AWS and we have 5 million product variants, so we decided to use ES because it guarantees the performance and speed we need.

giosueDelgado · 2023-12-05T08:54:34Z

Oh sure for this! I think on same performance guaranteed I prefer open source project, no?

monrostar · 2023-12-05T09:58:01Z

Oh sure for this! I think on same performance guaranteed I prefer open source project, no?

Yes, I support it. But I prefer to choose services that are time-tested and where there is no need to experiment. But that's my choice :) Not always have the time and business that can afford such a luxury :)

…s and variants

giosueDelgado · 2024-01-23T12:08:19Z

@monrostar I write to you on discord because I have a permission issue on the PR (I can clone, I made the little change but and error on push)

Let me know when you can give to me the permission so I can close this issue

giosueDelgado · 2024-01-30T17:35:29Z

@michaelbromley can you check this pr? I check the changes and are very limited, the test not work also if I run from the start of the commit of this pr can you give me some suggestion?

Thanks

monrostar · 2024-02-06T14:08:06Z

Wow, really :) Thanks a lot to our community for helping to finish this PR :) too bad I didn't have the energy and time to do it...

michaelbromley · 2024-02-06T14:08:30Z

Because this branch was very old, I had to do quite a bit of manual merging to get it compatible with the current master branch.

The only major problem was that the relationLoadStrategy: 'query' which was added in 2 places of the indexing logic had to be removed. The reason is that it is buggy and breaks when used in combination with nested entity relations, which is how Vendure implements custom field relations. This is why almost all e2e tests were failing.

This has the downside of undoing some of the memory-pressure improvements, but is sadly necessary at this time until the TypeORM implementation is fixed.

monrostar added 2 commits July 31, 2023 20:38

chore: added missing rawConnection for productIds and removed old const

532c930

monrostar added 2 commits July 31, 2023 21:01

docs: correct wrong variable name in the doc for reindexProductsChunk…

b98fada

…Size

chore: fix wrong variables names for defaultOptions

fb2451f

monrostar changed the title ~~perf: optimize memory consumption during reindex.~~ perf(Elasticsearch): optimize memory consumption during reindex. Aug 1, 2023

monrostar added 2 commits August 6, 2023 14:24

perf(elasticsearch): added optimization for each opirations during re…

719cea5

…index and other operations

chore(elasticsearch): fixed some minor styles

01fd4c4

michaelbromley requested changes Aug 7, 2023

View reviewed changes

chore(elasticsearch): fixed import for asyncObservable

955493c

Gmin2 mentioned this pull request Jan 19, 2024

[ElasticsearchPlugin]: Allow to configure elastic reindex chunk sizes #1770

Closed

fix(elasticsearch-plugin): 🐛 Fix relation on customfields for product…

77c10f2

…s and variants

michaelbromley merged commit d84011e into vendure-ecommerce:master Feb 6, 2024
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(Elasticsearch): optimize memory consumption during reindex. #2327

perf(Elasticsearch): optimize memory consumption during reindex. #2327

monrostar commented Jul 31, 2023 •

edited

Loading

netlify bot commented Jul 31, 2023 •

edited

Loading

monrostar commented Aug 1, 2023 •

edited

Loading

michaelbromley commented Aug 4, 2023

monrostar commented Aug 6, 2023 •

edited

Loading

monrostar commented Aug 6, 2023

michaelbromley Aug 7, 2023

monrostar Aug 8, 2023

monrostar commented Aug 8, 2023

michaelbromley commented Nov 23, 2023

monrostar commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

monrostar commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

monrostar commented Nov 26, 2023

giosueDelgado commented Nov 27, 2023

giosueDelgado commented Nov 30, 2023

giosueDelgado commented Dec 3, 2023 •

edited

Loading

michaelbromley commented Dec 4, 2023

giosueDelgado commented Dec 4, 2023

monrostar commented Dec 4, 2023 •

edited

Loading

monrostar commented Dec 4, 2023

giosueDelgado commented Dec 5, 2023 •

edited

Loading

monrostar commented Dec 5, 2023

giosueDelgado commented Dec 5, 2023

monrostar commented Dec 5, 2023

giosueDelgado commented Jan 23, 2024

giosueDelgado commented Jan 30, 2024

monrostar commented Feb 6, 2024

michaelbromley commented Feb 6, 2024

perf(Elasticsearch): optimize memory consumption during reindex. #2327

perf(Elasticsearch): optimize memory consumption during reindex. #2327

Conversation

monrostar commented Jul 31, 2023 • edited Loading

netlify bot commented Jul 31, 2023 • edited Loading

❌ Deploy Preview for effervescent-donut-4977b2 failed.

monrostar commented Aug 1, 2023 • edited Loading

michaelbromley commented Aug 4, 2023

monrostar commented Aug 6, 2023 • edited Loading

monrostar commented Aug 6, 2023

michaelbromley Aug 7, 2023

Choose a reason for hiding this comment

monrostar Aug 8, 2023

Choose a reason for hiding this comment

monrostar commented Aug 8, 2023

michaelbromley commented Nov 23, 2023

monrostar commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

monrostar commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

giosueDelgado commented Nov 23, 2023

monrostar commented Nov 26, 2023

giosueDelgado commented Nov 27, 2023

giosueDelgado commented Nov 30, 2023

giosueDelgado commented Dec 3, 2023 • edited Loading

michaelbromley commented Dec 4, 2023

giosueDelgado commented Dec 4, 2023

monrostar commented Dec 4, 2023 • edited Loading

monrostar commented Dec 4, 2023

giosueDelgado commented Dec 5, 2023 • edited Loading

monrostar commented Dec 5, 2023

giosueDelgado commented Dec 5, 2023

monrostar commented Dec 5, 2023

giosueDelgado commented Jan 23, 2024

giosueDelgado commented Jan 30, 2024

monrostar commented Feb 6, 2024

michaelbromley commented Feb 6, 2024

monrostar commented Jul 31, 2023 •

edited

Loading

netlify bot commented Jul 31, 2023 •

edited

Loading

monrostar commented Aug 1, 2023 •

edited

Loading

monrostar commented Aug 6, 2023 •

edited

Loading

giosueDelgado commented Dec 3, 2023 •

edited

Loading

monrostar commented Dec 4, 2023 •

edited

Loading

giosueDelgado commented Dec 5, 2023 •

edited

Loading