-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault is unable to restore a large snapshot #24245
Comments
To add: if you switch to using curl instead of the vault client, to restore a (now) 31Gb snapshot, you need a machine with more than 64Gb memory because otherwise the OOM killer will get you. I'm concerned this is a problem for Consul and Nomad snapshots as well and kind of puts me ill at ease with regards to restoring from disastrous outages. |
Set up a fresh server with vault, 128gb ram, reasonably freshly taken snapshot, and this is the result of attempting to restore said snapshot with curl:
The vault log (with level debug) shows only the following:
What do I do? I'm now running our supposed "vault cluster" on a single node, that I can back up to snapshots, but I apparently cannot restore said snapshots. I'd like a solution... |
Welp. Solution found: increasing the http_read_timeout on the listener did the trick; I do still feel the default timeout on this is too low for production use, I'm quite sure I'm not the only one with large snapshots to restore. Anyway. I'll close this, but maybe an idea to document this somewhere (i.e. large snapshots -> increase http read timeout) |
Describe the bug
Vault is seemingly unable to restore a 29Gb snapshot
To Reproduce
Get yourself a nice big snapshot, attempt to restore it to a newly initialized cluster (using -force), watch the errors
Expected behavior
A snapshot to be restored
Environment:
vault status
): 1.15.2vault version
): 1.15.2Vault server configuration file(s):
Additional context
I can't replicate the exact error message at the moment due to being in the middle of an attempt at recovery using some filthy methods, but:
attempt #1: "could not read request body"
then increased the vault client timeout by
export VAULT_CLIENT_TIMEOUT=86400s
attempt #2: "could not read request body"
then set the
max_request_duration
andmax_request_size
in the vault listeners configattempt #3..n: "broken pipe"
This tells me that the vault client is attempting to dump the entire 29Gb to the vault server in one sitting, and the vault server is obviously not liking this very much.
It's mildly annoying that the "official" backup and restore method isn't actually working to restore the backup I made...
The text was updated successfully, but these errors were encountered: