Skip to content

Conversation

@mberhault
Copy link
Contributor

Encryption at rest: part 1 of N.

Add preamble format for on-disk rocksdb instances:

  • new field in the --store flag: format=preamble (empty defaults to classic)
  • new on-disk version and added format field
  • proto definition of preamble
  • plaintext (no-op) block cipher stream

The on-disk version gets set to 2 iff we use the preamble format. This allows:

  • old binaries to keep reading classic format files created using new binaries
  • old binaries to fail with version check on preamble formats
  • new binaries to handle preamble/classic properly

Eventually, we may want to always write version 2. We may need to give it some time to give enough people to upgrade their binaries.

A few todos to resolve before merging this:

  • testing at the C++ level
  • is my message length correct? protos use little-endian, it's a little silly to write the length in network byte order
  • better error codes would be good. No built-in errorf in rocksdb::Status

marc added 2 commits November 17, 2017 09:17
Encryption-at-rest work: part 1.

This adds support for per-store "preamble" format.
Specified through the store flag with `format=preamble`.
The default value is `classic`.

Setting the preamble format causes a higher `COCKROACHDB_VERSION` value
as well as the `format=preamble` field to be set.
For now, stores created without `format=preamble` will not increase the
`COCKROACHDB_VERSION` value to allow downgrade.

This allows:
* old binaries to fail if preamble is enabled.
* new binaries to understand the new version but fail if not specified
on an existing preamble-formatted store
* stores in classic format can progress to future versions

The preamble format is currently plaintext only, encryption to be added
in a future PR.
@mberhault mberhault requested a review from a team November 17, 2017 19:07
@mberhault mberhault requested a review from a team as a code owner November 17, 2017 19:07
@mberhault mberhault requested a review from a team November 17, 2017 19:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable


// PlaintextCipherStream implements BlockAccessCipherStream with
// no-op encrypt/decrypt operations.
class PlaintextCipherStream final : public rocksdb::BlockAccessCipherStream {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot how this works. How do I get a concrete class without overriding the pure protected methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, isn't the point that you have to override the pure protected methods to get a concrete class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know. I suppose my complaint is about those protected methods being required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the encryption support in RocksDB that landed seems pretty specialized for the use case it was extracted from.

// special severity value DEFAULT instead.
pf.Lookup(logflags.LogToStderrName).NoOptDefVal = log.Severity_DEFAULT.String()

// Security flags.
Copy link
Contributor Author

@mberhault mberhault Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this in passing, that comment has nothing to do here.

@mberhault mberhault requested a review from bdarnell November 17, 2017 19:11
@mberhault mberhault mentioned this pull request Nov 17, 2017
29 tasks
@mberhault mberhault force-pushed the marc/add_preamble_format branch 2 times, most recently from 3feb890 to ee20868 Compare November 17, 2017 21:28
@mberhault mberhault force-pushed the marc/add_preamble_format branch from ee20868 to c8bb7cf Compare November 17, 2017 21:35
@bdarnell
Copy link
Contributor

LGTM, aside from doubts about whether we should be using this EncryptedEnv vs building our own.


Reviewed 8 of 10 files at r1, 6 of 7 files at r2, 2 of 4 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


c-deps/libroach/db.cc, line 1705 at r5 (raw file):

  PreambleHandler* preamble = nullptr;
  if (db_opts.use_preamble) {
    // The caller makes sure we're not an in-memory DB.

What's wrong with using the preamble format in memory?


c-deps/libroach/preamble.cc, line 40 at r5 (raw file):

  // Determine the serialized length and size of the length prefix.
  // TODO(mberhault): protobuf encoding is little-endian, so this is a little weird.

Weird in that you're writing this big-endian while protobuf uses little-endian? I don't think that's too bad; I'd rather use big-endian here and not even know what protobuf uses internally.


c-deps/libroach/preamble.cc, line 42 at r5 (raw file):

  // TODO(mberhault): protobuf encoding is little-endian, so this is a little weird.
  assert(byte_size < UINT16_MAX);
  uint16_t message_size = htons(byte_size);

s/message_size/encoded_size/g


c-deps/libroach/preamble.cc, line 46 at r5 (raw file):

  if ((byte_size + num_length_bytes) > prefixLength ) {
    // TODO(mberhault): is NoSpace the right thing? Nothing really fits.

I'd go with Corruption, same as below.


c-deps/libroach/preamble.h, line 24 at r5 (raw file):

// WARNING: changing this will result in incompatible on-disk format.
// The preamble length must fit in a uint16_t.
const static size_t defaultPreambleLength = 4096;

s/default/k/

This is also required to be a multiple of the page size, right? Mention that in the comment.

Unless you need them for tests, both constants in this header probably belong in the .cc file instead.


c-deps/libroach/preamble.h, line 48 at r5 (raw file):

// Blocksize for the plaintext cipher stream.
// TODO(mberhault): we can pick anything we like. What's good?
const static size_t plaintextBlockSize = 4096;

It looks like every write gets padded to a multiple of the block size, so setting it too high would waste space (but setting it too low wastes CPU to iterate over the data to do nothing in small chunks).

I'd set this to 16 for now to match the block size of the AES cipher version. In the long run, we should probably refactor the EncryptionProvider interface so it can return something other than a BlockAccessCipherStream that can be an efficient no-op.


c-deps/libroach/preamble.h, line 76 at r5 (raw file):

  // Length of data is equal to BlockSize();
  virtual rocksdb::Status EncryptBlock(uint64_t blockIndex, char *data, char* scratch) override {
		return rocksdb::Status::OK();

Don't mix tabs and spaces. We use two-space indents in our c++ code.


pkg/cli/cliflags/flags.go, line 448 at r5 (raw file):

</PRE>
The "format" field specifies the underlying file format for the store.
Allowed values are "classic" (default if omitted), or "preamble". Preamble

One day the preamble format will be the default and then the "classic" behind whatever comes next. Let's just give these numbers instead of names.


pkg/storage/engine/version.go, line 26 at r5 (raw file):

type storageVersion int
type formatVersion int

This two-level versioning scheme is confusing and needs more documentation. Or maybe two version numbers is the wrong approach - is the "format" a sequence of versions or is it a set of features that may or may not be present?

It's really unfortunate that EncryptedEnv doesn't give us any feasible migration path so we'll have to keep supporting the old format forever. It ought to be possible to make an Env that lets us encode the format somehow (maybe in the filename?) instead of an all-or-nothing cutover. Then we could treat this like a normal version upgrade and eventually get all files written with the preamble format.


pkg/storage/engine/version_test.go, line 84 at r5 (raw file):

// TestOldVersionFormat verifies that new version format (not numbers, but actual json fields)
// parses properly when using the older version data structures.

What do you mean? Just that the new Format field is ignored by old code?


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


c-deps/libroach/db.cc, line 1705 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What's wrong with using the preamble format in memory?

I haven't actually checked that it works properly, but there's no reason to right now.
If we decide to make the preamble the default format, then it may be simpler. I don't think we'll do encryption on it though


c-deps/libroach/preamble.cc, line 40 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Weird in that you're writing this big-endian while protobuf uses little-endian? I don't think that's too bad; I'd rather use big-endian here and not even know what protobuf uses internally.

Ok, removing the TODO.


c-deps/libroach/preamble.cc, line 42 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/message_size/encoded_size/g

Done.


c-deps/libroach/preamble.cc, line 46 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'd go with Corruption, same as below.

Done.


c-deps/libroach/preamble.h, line 24 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/default/k/

This is also required to be a multiple of the page size, right? Mention that in the comment.

Unless you need them for tests, both constants in this header probably belong in the .cc file instead.

Renamed, added "page size" command (not actually a requirement, just preferred).
Moved both constants to .cc and moved entire PlaintextCipherStream.


c-deps/libroach/preamble.h, line 48 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It looks like every write gets padded to a multiple of the block size, so setting it too high would waste space (but setting it too low wastes CPU to iterate over the data to do nothing in small chunks).

I'd set this to 16 for now to match the block size of the AES cipher version. In the long run, we should probably refactor the EncryptionProvider interface so it can return something other than a BlockAccessCipherStream that can be an efficient no-op.

Done.
By implementing as a BlockAccessCipherStream, we already avoid a lot of overhead. We'll need to benchmark for 1) good default size, and 2) whether to strip out more of the env_encryption wrapper.


c-deps/libroach/preamble.h, line 76 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Don't mix tabs and spaces. We use two-space indents in our c++ code.

Done. I was copy/pasting. I miss gofmt


pkg/cli/cliflags/flags.go, line 448 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

One day the preamble format will be the default and then the "classic" behind whatever comes next. Let's just give these numbers instead of names.

We can certainly use numbers instead, but this feels more user-friendly. The docs will have to mention the need for the preamble format, skipping a level of indirection ("which number is preamble again?") seems convenient.


pkg/storage/engine/version.go, line 26 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This two-level versioning scheme is confusing and needs more documentation. Or maybe two version numbers is the wrong approach - is the "format" a sequence of versions or is it a set of features that may or may not be present?

It's really unfortunate that EncryptedEnv doesn't give us any feasible migration path so we'll have to keep supporting the old format forever. It ought to be possible to make an Env that lets us encode the format somehow (maybe in the filename?) instead of an all-or-nothing cutover. Then we could treat this like a normal version upgrade and eventually get all files written with the preamble format.

The format is a "feature" of the storage. We still want to be able to have "version 2 with preamble" and "version 2 without preamble'.
The part that gets tricky here is that we don't want to wreck old binaries running against a "version 2 without preamble".

As for a different env, I mentioned this is the RFC. We could perform our own mapping of filename -> format, but that seems a bit fragile. The one major advantage of that solution is that we could use the classic format for plaintext.
However, that still would not solve the version issue, you would need to make sure you can handle the preamble format, requiring a version bump.


pkg/storage/engine/version_test.go, line 84 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What do you mean? Just that the new Format field is ignored by old code?

Yup. Technically the json parsing code does not fail on unknown fields, but that's about to become an option. If someone turns that on, I want the test to fail.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

I've added a "custom env" section in the rfc under "Rationale and alternatives".


Review status: 18 of 20 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Closing in favor of the switching env. See updated RFC.

@mberhault mberhault closed this Nov 28, 2017
mberhault pushed a commit to mberhault/cockroach that referenced this pull request Nov 29, 2017
This is to be contrasted to the preamble method in cockroachdb#20124.

This method is discussed in the `Custom env for encryption state`
section of the
[Encryption RFC](cockroachdb#19785)

When encryption is enabled, use a switching env that can redirect
each Env method to one of:
* base env for plaintext (same format as currently, no overhead)
* encrypted env (with or without preamble) for encrypted files

The switching env will hold the list of encrypted files to know which
env to pick.
mberhault pushed a commit to mberhault/cockroach that referenced this pull request Nov 30, 2017
This is to be contrasted to the preamble method in cockroachdb#20124.

This method is discussed in the `Custom env for encryption state`
section of the
[Encryption RFC](cockroachdb#19785)

When encryption is enabled, use a switching env that can redirect
each Env method to one of:
* base env for plaintext (same format as currently, no overhead)
* encrypted env (with or without preamble) for encrypted files

The switching env will hold the list of encrypted files to know which
env to pick.
@mberhault mberhault deleted the marc/add_preamble_format branch January 26, 2018 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants