Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul snapshot save fails to verify snapshot #4452

Closed
arnoldyahad opened this issue Jul 26, 2018 · 10 comments
Closed

consul snapshot save fails to verify snapshot #4452

arnoldyahad opened this issue Jul 26, 2018 · 10 comments
Assignees
Labels
type/bug Feature does not function as expected
Milestone

Comments

@arnoldyahad
Copy link

When filing a bug, please include the following headings if
possible. Any example text in this template can be deleted.

Overview of the Issue

trying to do consul snapshot save <file_name> from

Reproduction Steps

Steps to reproduce this issue, eg:

1.just a consul cluster on 1.2.1

Consul info for both Client and Server

consul info
agent:
check_monitors = 0
check_ttls = 3
checks = 3
services = 4
build:
prerelease =
revision = 39f93f0
version = 1.2.1
consul:
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 10.200.4.70:8300
server = true
raft:
applied_index = 375494809
commit_index = 375494809
fsm_pending = 0
last_contact = 0
last_log_index = 375494809
last_log_term = 68349
last_snapshot_index = 375491206
last_snapshot_term = 68349
latest_configuration = [{Suffrage:Voter ID:673e0ff3-815a-4c56-0b5b-1ccf17d7a7d1 Address:10.200.1.154:8300} {Suffrage:Voter ID:082aa7eb-3692-0d7b-b4fe-7efd1ab5b42e Address:10.200.4.70:8300} {Suffrage:Voter ID:1b0d7168-4d6c-0b73-b8e9-c805223a6428 Address:10.200.10.157:8300}]
latest_configuration_index = 374999275
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 68349
runtime:
arch = amd64
cpu_count = 8
goroutines = 15765
max_procs = 4
os = linux
version = go1.10.1
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 4155
failed = 0
health_score = 0
intent_queue = 0
left = 289
member_time = 1462570
members = 1456
query_queue = 0
query_time = 245
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 3321
members = 3
query_queue = 0
query_time = 1
output from client 'consul info' command here


</details>

the error im getting is:
:~# consul snapshot save consul-us-$(date +%Y-%m-%d).snap
Error verifying snapshot file: failed to read snapshot file: failed checking integrity of snapshot: hash check failed for "meta.json"


<details>
  <summary>Server info</summary>


### Operating system and Environment details
ubuntu14.04

@mkeeler
Copy link
Member

mkeeler commented Jul 26, 2018

@arnoldyahad Are there any logs emitted from the Consul agent you are connecting to to take the snapshot (presumably the agent running on the machine where you ran the consul snapshot save command?

@pearkes pearkes added type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp waiting-reply Waiting on response from Original Poster or another individual in the thread labels Jul 26, 2018
@connaryscott
Copy link

connaryscott commented Jul 31, 2018

I am also getting this error,

[centos@ip-10-162-100-5 ~]$ consul snapshot save -datacenter "${REGION}" -token "${CONSUL_TOKEN}" -stale /tmp/mysnap.tgz2
Error verifying snapshot file: failed to read snapshot file: failed checking integrity of snapshot: hash check failed for "meta.json"
[centos@ip-10-162-100-5 ~]$ consul version
Consul v1.0.2+ent
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

@rikwasmus
Copy link

FWIW, this seems an error in verifying the snapshot, not with the snapshot. Verifying the sums of the resulting file seems OK.

# consul --version
Consul 1.0.7-dev
# /usr/sbin/consul snapshot save -stale -token=$(/bin/cat /etc/consul.d/config.json | /usr/bin/jq -r '.acl_master_token') /tmp/snapshot.tar
Error verifying snapshot file: failed to read snapshot file: failed checking integrity of snapshot: hash check failed for "meta.json"
Logs (trace level): 
   2018/08/01 00:23:35 [DEBUG] http: Request GET /v1/snapshot (2.618758905s) from=127.0.0.1:44361
   2018/08/01 00:23:35 [DEBUG] http: Request GET /v1/agent/self (4.045734ms) from=127.0.0.1:44364
.. no errors
/tmp# tar xvf snapshot.tar 
meta.json
state.bin
SHA256SUMS
/tmp# cat SHA256SUMS 
1af13876043c159de314cf973a61a8c5f8a1c5d93299d70f51d4d67478d9a122  meta.json
113544edf112537c7ae53e12a385b242de6968cacd44bc76485a915b9f3d2fe4  state.bin
/tmp# sha256sum meta.json 
1af13876043c159de314cf973a61a8c5f8a1c5d93299d70f51d4d67478d9a122  meta.json
/tmp# sha256sum state.bin 
113544edf112537c7ae53e12a385b242de6968cacd44bc76485a915b9f3d2fe4  state.bin

@pearkes pearkes added needs-investigation The issue described is detailed and complex. and removed waiting-reply Waiting on response from Original Poster or another individual in the thread labels Aug 21, 2018
@pearkes
Copy link
Contributor

pearkes commented Oct 26, 2018

Looks like #4738 is the same issue.

@pearkes pearkes added this to the Upcoming milestone Nov 2, 2018
@pearkes
Copy link
Contributor

pearkes commented Nov 2, 2018

We likely need to give this the same treatment as #4892 (unconfirmed though)

@maf23
Copy link

maf23 commented Nov 27, 2018

I have a cluster running 1.3.0 which constantly exhibits this problem. I am no golang expert but did some digging. If I unpack the snapshot file with gtar the meta.json file ends with a newline. But it seems that when consul calculates the checksum when inspecting the file internally the data returned by tar.NewReader does not include the newline.

The checksum in SHA256SUMS is correct for the file with the newline.

Extracting and then repacking the snapshot files with gtar did not fix the problem, but I found that I can repair a broken snapshot by doing this:

  1. Unpack the snapshot with gtar
  2. Edit the extracted meta.json file and add a blank character (keep the json valid)
  3. Calculate the sha256 checksum for the modified meta.json and update SHA256SUMS
  4. Pack the files into a snapshot with gtar

This repacked archive now passes the inspect test.

The original meta.json (in the broken file) was exactly 513 bytes long.

@maf23
Copy link

maf23 commented Nov 29, 2018

I did some more digging and think I have found a fix. The following patch forces the json decode function to read the final newline as well. An alternative solution would have been to not include the newline when saving the file, but this solution has the advantage that it can read snapshots which were taken and was reported broken before the patch.

diff --git a/snapshot/archive.go b/snapshot/archive.go
index b6db126a8..eafdbe990 100644
--- a/snapshot/archive.go
+++ b/snapshot/archive.go
@@ -197,6 +197,8 @@ func read(in io.Reader, metadata *raft.SnapshotMeta, snap io.Writer) error {
                        if err := dec.Decode(&metadata); err != nil {
                                return fmt.Errorf("failed to decode snapshot metadata: %v", err)
                        }
+                       // Read any additional bytes
+                       dec.More()
 
                case "state.bin":
                        if _, err := io.Copy(io.MultiWriter(snap, snapHash), archive); err != nil {

@pearkes pearkes added type/bug Feature does not function as expected and removed needs-investigation The issue described is detailed and complex. type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp labels Dec 7, 2018
@pearkes pearkes modified the milestones: Upcoming, 1.4.1 Dec 7, 2018
@hanshasselberg
Copy link
Member

hanshasselberg commented Jan 4, 2019

Hello, I tried to reproduce this issue, but it I never ran into it. Could someone provide me with a snapshot that fails and the consul version that cannot load that snapshot?

I checked what @maf23 said, but that didn't help me.

As far as I understood the issue is that meta.json has a newline and the sha in SHASUMS is correct, but it fails when verifying it.

@maf23
Copy link

maf23 commented Jan 4, 2019

Here is a zipfile with two snapshots I made of an empty consul cluster. One 'good.snap' works just fine but the other 'longer.snap' I modified deliberately just to demonstrate the issue. With my patch above consul accepts both files (easily tested by doing a snapshot inspect) while an out of the box consul binary refuses to read longer.snap.

snapshots.zip

The only change I did was to add some extra spacing to meta.json and update the checksum in SHA256SUMS (so it matched the modified meta.json.

@hanshasselberg
Copy link
Member

Thank you @maf23! I was able to reproduce and made a PR!

hanshasselberg added a commit that referenced this issue Jan 8, 2019
* snapshot: read meta.json correctly.

Fixes #4452.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

7 participants