Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: fix usage of gzip.Reader to better detect corrupt snapshots during save/restore #7697

Merged
merged 2 commits into from
Apr 24, 2020

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Apr 23, 2020

When generating a snapshot using consul snapshot save the server prepares the snapshot as a gzip compressed tar archive which is streamed over an HTTP response straight to disk on the CLI side. Immediately after it receives this data it runs an internal verification to sanity check that the snapshot that was produced is parseable. It does this by:

  1. opening the file as a gzip.Reader
  2. reading from that via a tar archive reader
  3. iterating over the contents and generating checksums
  4. comparing those checksums to those present in the archive
  5. 👍

Sometimes "network things happen" and you get your responses from the server unexpectedly truncated early (like #6893). This should be expected when streaming a large snapshot, and the CLI should detect it as #6893 does ... sometimes.

Unfortunately if you do targeted removal of a few bytes from the end of the snapshot files they still validate.

There's a small catch to how to safely use a gzip.Reader and we aren't doing that today:

Gzip files store a length and checksum of the uncompressed data. The Reader will return an ErrChecksum when Read reaches the end of the uncompressed data if it does not have the expected length or checksum. Clients should treat data returned by Read as tentative until they receive the io.EOF marking the end of the data. link

We don't explicitly attempt to drain the gzip.Reader after walking the tar archive portion to see if the rest of the gzip envelope is present in its entirety. If you do actually read that stream until EOF you can properly fail when even a single byte is lost from the end of the gzip envelope.

This PR fixes that, and adds tests to cover early truncation situations like this.

@rboyer rboyer added the type/bug Feature does not function as expected label Apr 23, 2020
@rboyer rboyer requested a review from a team April 23, 2020 20:34
@rboyer rboyer self-assigned this Apr 23, 2020
@rboyer
Copy link
Member Author

rboyer commented Apr 23, 2020

I think given how we nest a required SHA256SUMS tar entry inside of the compressed section, and verify that each of the meta.json and state.bin files match the checksums, and also require them to even have checksums in the SHA256SUMS entry I don't think the missing gzip envelope checksums and length checks matter for correctness in practice.

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! The tests make the problem very clear.

Just one question about the new StartTestServer helper

// StartTestServer fires up a web server on a random unused port to serve the
// given handler body. The address it is listening on is returned. When the
// test case terminates the server will be stopped via cleanup functions.
func StartTestServer(t testing.T, handler http.Handler) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to https://golang.org/pkg/net/http/httptest/#NewServer.

Would httptest.Server work here? If not I think the reason would be helpful to include in the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly httptest.Server only thinks a port is free if it's not bound. Consul tests frequently reserve ports via sdk/freeport so you can have one part of the test try to use a port and know nothing is listening. If you simply assumed unbound ports were free you'd end up with test cross-talk and weirdness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating comment when i do the merge


if err := concludeGzipRead(decomp); err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor style note, you could remove the if and return err in both cases:

err := concludeGzipRead(decomp)
return &metdata, err

I believe as a convention non-err return values are considered to be not usable if the err return is non-nil, unless explicitly documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with what you're saying, but I try to avoid leaving loaded footguns lying around if it's fairly cheap to do so, so prefer to just not hand back non-zero return values when an error occurs to prevent being misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants