-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk indexer is inefficient and slow #528
Comments
We are about to implement some benchmarks and dive into profiling, as such is always a good thing to learn. I'll share some examples or maybe Dont's as soon as we have identified possible bottlenecks. |
Closing this issue. I'll be happy to reopen if you come back with code and/or examples ! |
We did evaluate our code, and this makes me officially and idiot now! While reviewing the code, which lead to this ticket, i oversaw that the FlushInterval was set to 1 second, which made the loop take forever. We bench marked your lib against olivers and guess what .. it outperformed it. The cpu profiling showed ~28% less memory allocations and ran **~ 25% faster** no more word than, sorry for wasting your time!!! |
While it may be a nice idea for some codebases to use structs with json conversion, it is not the best idea when adding a couple of 100.000 entries.
In my case, to give you a rough idea, the import times for ~350.000 entries(from csv files) with ~10-20 short fields and no extensive analyzer applied, into 5 different indices takes >7m, on a recent Linux Dell Precision 5550. I'll spare you the bulk size, flush and other details because they dont change much for the basic problem.
I used oliveres elastic client, which is structured internally to also accept simple strings for bulk-requests see the Source method call here. With this in place i was able to build the bulk request body rows with some fmt.Sprintf() calls and preventing struct initialization and with it cpu & memory allocation. With such handling the import took ~6 secs. Before that i used oliveres method with struct handling and it was ~ 12 secs.
Yes, building a string which is valid JSON may be inkonvenient in some situations, but a decent programmer should be able to achieve this. And we are talking about bulk inserts which are probably used in situation where you need speed. When i see the code in your bulk_indexer worker run and dive into the meta-line i am pretty stunned of how difficult it can be to create something wrapped by braces and joined with commas.
To wrap it up, your bulk index is pretty much useless unless you are inserting only 100 entries or you opt into throwing hardware at the problem(which will probably NOT work). I am not an ultra experienced go dev, so i may have missed some low level methods in here. My advice would be to look at oliveres code-structures and adopt some of his bulk-ideas, also because your open bulk related tickets dont look promising. Unfortunately he decided to not continue his lib, so one may need to fork it and make it work with ES 8+
The text was updated successfully, but these errors were encountered: