-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consul snapshot save fails to verify snapshot #4452
Comments
@arnoldyahad Are there any logs emitted from the Consul agent you are connecting to to take the snapshot (presumably the agent running on the machine where you ran the |
I am also getting this error,
|
FWIW, this seems an error in verifying the snapshot, not with the snapshot. Verifying the sums of the resulting file seems OK.
|
Looks like #4738 is the same issue. |
We likely need to give this the same treatment as #4892 (unconfirmed though) |
I have a cluster running 1.3.0 which constantly exhibits this problem. I am no golang expert but did some digging. If I unpack the snapshot file with gtar the meta.json file ends with a newline. But it seems that when consul calculates the checksum when inspecting the file internally the data returned by tar.NewReader does not include the newline. The checksum in SHA256SUMS is correct for the file with the newline. Extracting and then repacking the snapshot files with gtar did not fix the problem, but I found that I can repair a broken snapshot by doing this:
This repacked archive now passes the inspect test. The original meta.json (in the broken file) was exactly 513 bytes long. |
I did some more digging and think I have found a fix. The following patch forces the json decode function to read the final newline as well. An alternative solution would have been to not include the newline when saving the file, but this solution has the advantage that it can read snapshots which were taken and was reported broken before the patch.
|
Hello, I tried to reproduce this issue, but it I never ran into it. Could someone provide me with a snapshot that fails and the consul version that cannot load that snapshot? I checked what @maf23 said, but that didn't help me. As far as I understood the issue is that |
Here is a zipfile with two snapshots I made of an empty consul cluster. One 'good.snap' works just fine but the other 'longer.snap' I modified deliberately just to demonstrate the issue. With my patch above consul accepts both files (easily tested by doing a snapshot inspect) while an out of the box consul binary refuses to read longer.snap. The only change I did was to add some extra spacing to meta.json and update the checksum in SHA256SUMS (so it matched the modified meta.json. |
Thank you @maf23! I was able to reproduce and made a PR! |
* snapshot: read meta.json correctly. Fixes #4452.
When filing a bug, please include the following headings if
possible. Any example text in this template can be deleted.
Overview of the Issue
trying to do consul snapshot save <file_name> from
Reproduction Steps
Steps to reproduce this issue, eg:
1.just a consul cluster on 1.2.1
Consul info for both Client and Server
consul info
agent:
check_monitors = 0
check_ttls = 3
checks = 3
services = 4
build:
prerelease =
revision = 39f93f0
version = 1.2.1
consul:
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 10.200.4.70:8300
server = true
raft:
applied_index = 375494809
commit_index = 375494809
fsm_pending = 0
last_contact = 0
last_log_index = 375494809
last_log_term = 68349
last_snapshot_index = 375491206
last_snapshot_term = 68349
latest_configuration = [{Suffrage:Voter ID:673e0ff3-815a-4c56-0b5b-1ccf17d7a7d1 Address:10.200.1.154:8300} {Suffrage:Voter ID:082aa7eb-3692-0d7b-b4fe-7efd1ab5b42e Address:10.200.4.70:8300} {Suffrage:Voter ID:1b0d7168-4d6c-0b73-b8e9-c805223a6428 Address:10.200.10.157:8300}]
latest_configuration_index = 374999275
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 68349
runtime:
arch = amd64
cpu_count = 8
goroutines = 15765
max_procs = 4
os = linux
version = go1.10.1
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 4155
failed = 0
health_score = 0
intent_queue = 0
left = 289
member_time = 1462570
members = 1456
query_queue = 0
query_time = 245
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 3321
members = 3
query_queue = 0
query_time = 1
output from client 'consul info' command here
The text was updated successfully, but these errors were encountered: