-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic error when saving snapshots: unexpected EOF #6893
Comments
This happens to me on 1.6.2 on Arm64. It was also happening on a different cluster that's on amd64 before I upgraded that cluster to 1.6.2, now all is quiet on that front. Errors from the logs of the member which requested the snapshot:
The cluster leader's logs look normal:
Something additional that's interesting. For the failed snapshots, every single one of them has the same exact size Example:
All the numbers around 85K are good snapshots, the rest are failures. This does not seem to happen on the cluster leader, just the follower nodes. |
Having the same issue on 1.7.1... here's the output of the same command ran 3 times consecutively [root@server server]# CONSUL_HTTP_TOKEN=my-token ./bin/consul snapshot save consul.snapshot
Error verifying snapshot file: failed to read snapshot file: failed to read or write snapshot data: unexpected EOF
[root@server server]# CONSUL_HTTP_TOKEN=my-token ./bin/consul snapshot save consul.snapshot
Error saving snapshot: Unexpected response code: 500 (failed to decode response: read tcp 192.168.0.1:60444->192.168.0.2:8300: read: connection reset by peer)
[root@server server]# CONSUL_HTTP_TOKEN=my-token ./bin/consul snapshot save consul.snapshot
Saved and verified snapshot to index 23436477 |
It also happens on |
Hi all! Just wanted to check anyone has seen this issue when trying to create snapshots since the release of v1.7.3, which includes the following pull request that address this error. |
Hi all! Closing this as I just tried to repro this behavior with v1.9.4 by running a watch command that saved a consul snapshot every 3 seconds for 20 minutes. But do feel to re-open if you see this behavior in any of the more recent Consul releases |
For what it's worth this is occurring in k8s 1.19 with the consul-helm chart. Consul 1.9.3
|
I'm seeing very similar behavior while running the snapshot in agent mode under Consul Enterprise 1.9.3
|
we are using consul 1.8.0 as pods using the helm chart deployment and while performing snapshot save and restore, we got the same error
is there a fix for this yet? Also for the workaround, can anyone verify if taking snapshot from the leader solves the problem? @ChipV223 please re-open this issue since this is not fixed. |
Thanks for re-opening this @ChipV223! This seems like a strange issue and persistent issue. I mentioned this issue to a few of the other engineers, and the recent activity on this being focused on consul in k8s leads us to believe it could be a PV issue... Also, @benumbed noted that:
Which is making us wonder if this issue only occurs at this exact size ( or at a different, but consistent size ) for others. So my questions to @shomeprasanjit , @wsams , and @drewby08 would be:
|
@Amier3: Thanks for following up. Here are your answers: Are you using consul in containers? and if so how are you all handling PVs? Ans) Yes we are running consul as pods as a standard helm chart. we are using gp3 EBS volumes as persistent storage. this is controlled by StatefulSet. Are all the failed snapshots you all are seeing the exact same size in bytes? Ans) not sure exactly, it was probably in MB's but definitely not in bytes. |
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
I keep getting random errors when creating backups of our Consul.
$ consul snapshot save /consul/data/20191206120000-1.snap
Error verifying snapshot file: failed to read snapshot file: failed to read or write snapshot data: unexpected EOF
Reproduction Steps
Just keep creating snapshots and you can see that it sometimes fails.
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068184
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068185
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068186
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068186
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068188
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068188
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068188
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068188
$ consul snapshot save /consul/data/20191206120000-1.snap
Error verifying snapshot file: failed to read snapshot file: failed to read or write snapshot data: unexpected EOF
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068194
$ consul snapshot save /consul/data/20191206120000-1.snap
Saved and verified snapshot to index 1068195
Consul info for both Client and Server
$ consul version
Consul v1.6.1
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Client info
Server info
Operating system and Environment details
OS, Architecture, and any other information you can provide about the environment.
Log Fragments
Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use
-log-level=TRACE
on the client and server to capture the maximum log detail.The text was updated successfully, but these errors were encountered: