Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch backup performance improvement #61

Open
andreatera opened this issue Jan 20, 2022 · 7 comments
Open

Elasticsearch backup performance improvement #61

andreatera opened this issue Jan 20, 2022 · 7 comments

Comments

@andreatera
Copy link

Currently using backman v1.30.2 backing up data from our elasticsearch service in internal iAPC.

Backman is taking almost 30min to backup 23MB, now is running since almost 5 hours to backup 600MB and is not yet done.

That's not possible, we have elasticsearch with 4-5GB of data still growing, how long will take to backup those? days?

@denysvitali
Copy link

I've created searchdump (Swisscom-internal only for now, open source soon) which is basically solving the performance issue of elasticsearch-dump and is implemented in Go. Therefore the integration with backman should be pretty easy. Me and @JamesClonk already chatted quickly about it already :)

@andreatera
Copy link
Author

that's very good news!
Any plan when you'll integrate in backman? So we can track it an make a try when is ready?

@denysvitali
Copy link

I'll work on some backman features today, but sadly I haven't planned some time to make searchdump open source or integrate it w/ backman yet

@denysvitali
Copy link

FYI: searchdump is now public :)

@buffalonan
Copy link

Hello @denysvitali, we are very interested in the feature, since we have big ES instances that we cannot backup with Backman since a couple of years. Any news regarding its integration to Backman? Thanks

@apaulino42
Copy link

apaulino42 commented Nov 26, 2024

Hello,

@JamesClonk, @denysvitali, We are also very interested.
I'm not sure if the integration of searchdump is still on the table, but if not, perhaps we could consider adding two new parameters to elasticdump: 'limit' and 'searchBody'. This way, we could reasonably increase the limit and use the 'searchBody' parameter to filter the documents we want to back up.

For example, to back up the documents from the past month, we use the command:
elasticdump [...] --searchBody='{"query":{"range":{"@timestamp":{"gte":"now-1M/M","lte":"now/M"}}}}' --limit 200

@apaulino42
Copy link

apaulino42 commented Nov 26, 2024

Hello,

@JamesClonk, @denysvitali, We are also very interested. I'm not sure if the integration of searchdump is still on the table, but if not, perhaps we could consider adding two new parameters to elasticdump: 'limit' and 'searchBody'. This way, we could reasonably increase the limit and use the 'searchBody' parameter to filter the documents we want to back up.

For example, to back up the documents from the past month, we use the command: elasticdump [...] --searchBody='{"query":{"range":{"@timestamp":{"gte":"now-1M/M","lte":"now/M"}}}}' --limit 200

I'll respond to myself and to others that might have similar issues :).
I checked the code of backman and noticed the parameter backup_options that I totally missed...
You can actually use it to provide additional parameters to elasticdump like the ones I mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants