-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(Elasticsearch): optimize memory consumption during reindex. #2327
Conversation
…eindex. > Products limit chunk size for each loop iteration when indexing products. `reindexProductsChunkSize` - default value is 500 > Index operations are performed in bulk, with each bulk operation containing a number of individualindex operations. This option sets the maximum number of operations in the memory buffer before abulk operation is executed. `reindexBulkOperationSizeLimit` - default value is 3000 Refs: vendure-ecommerce#1770
❌ Deploy Preview for effervescent-donut-4977b2 failed.
|
@michaelbromley if (operations.length >= this.options.reindexBulkOperationSizeLimit) For example. Here we can have the same problem and we can fix it if we'll use limits for loops private async updateProductsInternal(ctx: MutableRequestContext, productIds: ID[]) {
const operations = await this.updateProductsOperations(ctx, productIds); // potential problem with big productIds list
await this.executeBulkOperations(operations);
} |
@monrostar thanks for the work on this so far! Regarding the last comment - do you plan to implement that additional optimization in this PR? |
@michaelbromley |
@michaelbromley Added optimization for all related operations |
|
||
const REINDEX_CHUNK_SIZE = 2500; | ||
const REINDEX_OPERATION_CHUNK_SIZE = 3000; | ||
import { asyncObservable } from "@vendure/core/src"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is causing issues due to importing from /src
. Add the asyncObservable
import back into the group of @vendure/core
imports and hopefully this resolves the failing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, didn't notice the error, I already fixed
@michaelbromley I need to fix some code, probably I missing something when coping with our code. I will fix e2e tomorrow |
@monrostar Did you ever get a chance to finish this? Do you need any help? |
Yes, I need help, I had a problem starting the project and I was never able to get back to fix this problem. I'll describe my problem in more detail later. I'm still in touch, just a lot of work |
We are having issue on reindex! If you can finish we can test ASAP @monrostar and give you a fast feedback |
Ok, I will try to edit the PR within 2 days |
FYI @LorenzoRottigni |
Cool! Thanks if you need help or some support tell us! |
Sorry, I wasn't able to work on PR this week. Unfortunately, I'm sick. I just don't have enough time I'm gonna need your help to finish this. This code I sent is basically the same code we are using right now in our project, we just need to find a bug because I transferred this code without testing and we have Eslint set up differently. In general, it is absolutely working code, we only need to debug because operations are split into separate queries to save memory and CPU. |
Ok, no problem, I can start to check this Wednesday, I give you updates |
@monrostar Me and @LorenzoRottigni try to check this today we give you updates as soon as we have! |
@monrostar I found a bug in the query you are using if there are relation on customfields. In my opinion the elasticsearch plugin the query to fetch the data (products and variant) can be extened in the config to allow a more performant way to manage the fetch based on the custom data of the installation what do you think? @michaelbromley |
I'm not totally sure whether you can directly contribute to @monrostar's PR - maybe you can if he gives you write access to his fork.
Maybe this can be a separate issue + PR? |
Yes, in my local I just did all things, in the @monrostar pr I will just fix the query as soon as he can give to me the permission, after that I open a new issue + pr for allow custom query based on the specific needs. I just develop all changes and also add some logging and some general improvements on the elasticsearch plugin I hope to contribute as soon as possible because it take a very important optimization and possibility to customize! :) |
I can give you all access, this is not an issue if you already done this :) |
In ES, we need to rewrite the index completely because mapping is not good for large amounts of data. It is necessary to rewrite the structure and also to split Product/ProductVariant into different indexes because they are responsible for different search levels and aggregation over such a large amount of data is a very costly procedure. We can take a more sophisticated approach. We are working on this improvement now, because we couldn't handle an index size of 200Million documents |
Agree with you @monrostar but as you know the elasticsearch is not open source any more so I think (for our case) is better migrate to other solution opensource. I think we can build some plugin based on typesense, Solr, MeiliSearch, (or the opensource fork of elasticsearch "OpenSearch") let me know what you think |
Such solutions depend on your business objectives and capabilities, so this choice does not always have to be made because it is opensource or not. We use ES in production on AWS and we have 5 million product variants, so we decided to use ES because it guarantees the performance and speed we need. |
Oh sure for this! I think on same performance guaranteed I prefer open source project, no? |
Yes, I support it. But I prefer to choose services that are time-tested and where there is no need to experiment. But that's my choice :) Not always have the time and business that can afford such a luxury :) |
@monrostar I write to you on discord because I have a permission issue on the PR (I can clone, I made the little change but and error on push) Let me know when you can give to me the permission so I can close this issue |
@michaelbromley can you check this pr? I check the changes and are very limited, the test not work also if I run from the start of the commit of this pr can you give me some suggestion? Thanks |
Wow, really :) Thanks a lot to our community for helping to finish this PR :) too bad I didn't have the energy and time to do it... |
Because this branch was very old, I had to do quite a bit of manual merging to get it compatible with the current master branch. The only major problem was that the This has the downside of undoing some of the memory-pressure improvements, but is sadly necessary at this time until the TypeORM implementation is fixed. |
Added two new variables to limit the amount of data that can be stored in memory during the reindex job
reindexProductsChunkSize
- default value is 2500reindexBulkOperationSizeLimit
- default value is 3000Refs: #1770
BREAKING CHANGE: This update optimizes memory usage for large amounts of data and applies to products with many variants. With the help of the memory storage limit, memory consumption can be increased or decreased. The processing speed and memory consumption depends on the limits set in the new variables.
And batchSize was removed because it was a variable that was not used elsewhere