Skip to content

Conversation

@windchan7
Copy link
Contributor

writeFileSyncing() now will be able to write encrypted content to RocksDB's
env. This commit is part of use encryption for all local disk usage (non-logs).

Issue: #19783.
Release note: None

@windchan7 windchan7 added the do-not-merge bors won't merge a PR with this label. label May 3, 2018
@windchan7 windchan7 requested a review from a team May 3, 2018 14:55
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@windchan7 windchan7 force-pushed the encryption branch 2 times, most recently from e8a95cc to 63e1735 Compare May 3, 2018 15:57
@benesch
Copy link
Contributor

benesch commented May 3, 2018

Woohoo, this is looking really good. The one piece of feedback so far—I haven't reviewed in detail yet—is to extract the AppendFile and CloseFile methods into a separate object. Roughly:

type Engine interface {
  OpenFile() (DBFile, error)
}

type DBFile interface {
  AppendFile(contents []byte) error
  CloseFile() error
}

type rocksdbFile struct {
  rdb *RocksDB
}

func (f *rocksdbFile) AppendFile(contents []byte) error {
  return C.DBEnvAppendFile(f.rdb, contents)
}

(This example is missing a lot of specifics, of course.)

That way you only need to add one method, OpenFile, to the Engine interface. Unless there's a reason you didn't take this approach to begin with?

@windchan7 windchan7 force-pushed the encryption branch 4 times, most recently from bf5085c to 9df1c34 Compare May 3, 2018 19:28
@bdarnell
Copy link
Contributor

bdarnell commented May 3, 2018

Reviewed 17 of 17 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


pkg/storage/syncing_write.go, line 110 at r1 (raw file):

		}
		if err == nil && sync {
			err = f.Sync()

This call to Sync was important for smoothing out performance in bulk IO tests. We need to keep that around.

@dt What's the best way to test that we don't introduce performance regressions here?


pkg/storage/engine/engine.go, line 36 at r1 (raw file):

// #include <stdlib.h>
// #include <libroach.h>
import "C"

We try not to duplicate all these headers in so many places. Also, C types are sometimes surprising in public interfaces. I think it would be best to define a type in rocksdb.go where the C stuff is already imported (especially with Nikhil's suggestion to put append and close on a new type).


Comments from Reviewable

@windchan7 windchan7 force-pushed the encryption branch 9 times, most recently from 1bf2c23 to a037835 Compare May 8, 2018 19:01
@windchan7
Copy link
Contributor Author

windchan7 commented May 10, 2018

@mberhault , Hey Marc, do you mind taking a look at this PR and let me know if I'm on the right track? Thank you.

Copy link
Contributor

@mberhault mberhault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I think a small benchmark comparing regular vs through-libroach writes would be good (one using an in-mem env, and one using actual disk).

std::string rest = ToString(path);
int i = 0;
int next;
while(rest.length() != 0 && (next = rest.find(delimiter)) != -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure to run make c-deps-fmt, or clang-format directly, or add clang-format to your editor save hook for .cc and .h files.


// EnvOpenFile opens a new file in the given engine.
DBStatus DBImpl::EnvOpenFile(DBSlice path, rocksdb::WritableFile** file) {
rocksdb::Status s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we tend to use the full word status for clarity.

DBStatus DBImpl::EnvAppendFile(rocksdb::WritableFile** file, DBSlice contents) {
rocksdb::Status s;
s = (*file)->Append(ToSlice(contents));
if (!s.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all the if (!s.ok()) { return ToDBStatus(s); } return kSuccess; calls, you can just return ToDBStatus(s).

@dt
Copy link
Contributor

dt commented May 13, 2018

Review status: 3 of 18 files reviewed at latest revision, 5 unresolved discussions.


pkg/storage/syncing_write.go, line 110 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This call to Sync was important for smoothing out performance in bulk IO tests. We need to keep that around.

@dt What's the best way to test that we don't introduce performance regressions here?

looks like it was re-added back so I'm 👍 here, but to answer w.r.t. testing:
the effect it had was harder to spot in isolated benchmarks, but was more apparent when running production tests, particularly on clouds the had less than ideal disk IO. When not syncing incrementally, we'd see more frequent slow liveness heartbeats, sometimes leading to cluster instability. TL;DR: If we were worried about ti, I'd run >1gb RESTORE (or IMPORT) on a cluster running with iffy disks and watch the latency of some minimal workload or for liveness errors.


Comments from Reviewable

@windchan7 windchan7 removed the do-not-merge bors won't merge a PR with this label. label May 13, 2018
@bdarnell
Copy link
Contributor

Reviewed 16 of 16 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions.


c-deps/libroach/engine.h, line 46 at r2 (raw file):

  virtual DBStatus EnvWriteFile(DBSlice path, DBSlice contents) = 0;
  virtual DBStatus EnvOpenFile(DBSlice path, rocksdb::WritableFile** file) = 0;
  virtual DBStatus EnvAppendFile(rocksdb::WritableFile** file, DBSlice contents) = 0;

EnvOpenFile needs a double-pointer, but I think the rest of these should just be a single pointer.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

  // Recursively create the directory of the given file.
  std::string delimiter = "/";

Not true on windows. Does rocksdb have a function we could use that does the right thing here (either to parse a string into directories or take a string and create multiple directories at once)?


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

	defer mtc.Stop()

	for i := 0; i < 3; i++ {

Why is this added to this test?


pkg/storage/client_replica_gc_test.go, line 75 at r2 (raw file):

	defer mtc.Stop()

	for i := 0; i < numStores; i++ {

And here?


pkg/storage/replica_sideload_test.go, line 111 at r2 (raw file):

	st := cluster.MakeTestingClusterSettings()

	cache := engine.NewRocksDBCache(1 << 20)

It would be nice to factor out some of the new repetition in this file.


pkg/storage/engine/rocksdb.go, line 2623 at r2 (raw file):

func (r *RocksDB) OpenFile(filename string) (DBFile, error) {
	var file C.DBWritableFile
	/*

Uncomment or remove this before merging.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 11 unresolved discussions.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Not true on windows. Does rocksdb have a function we could use that does the right thing here (either to parse a string into directories or take a string and create multiple directories at once)?

I didn't find any method on the RocksDB side that does the job. I also asked Marc and we agreed to write the recursive directory creation method on our own.

Do you have any suggestions on how to work around it, Ben? I feel like adding support for both Windows and Linux could cause a mess.


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why is this added to this test?

It's because if not, the test will use InMem RocksDB, which does not have access to the disk.


pkg/storage/client_replica_gc_test.go, line 75 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

And here?

Same reason as above.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 11 unresolved discussions.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

Previously, windchan7 (Victor Chen) wrote…

I didn't find any method on the RocksDB side that does the job. I also asked Marc and we agreed to write the recursive directory creation method on our own.

Do you have any suggestions on how to work around it, Ben? I feel like adding support for both Windows and Linux could cause a mess.

I saw that you had some commented-out code to create the directory from Go. That would be the simplest thing; is there some reason we can't use it?


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

Previously, windchan7 (Victor Chen) wrote…

It's because if not, the test will use InMem RocksDB, which does not have access to the disk.

Why does it need access to the disk now but it didn't before? If we can't mock everything out in memory, I'd rather change multiTestContext so that all the tests use on-disk rocksdb instead of having a few tests work differently.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 11 unresolved discussions.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I saw that you had some commented-out code to create the directory from Go. That would be the simplest thing; is there some reason we can't use it?

It's mainly because the os.Mkdir from go only works for on disk RocksDB. For inMem RocksDB, it won't notice there's a directory on disk already. Instead, it will throw a directory does not exist error, or something like that.


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why does it need access to the disk now but it didn't before? If we can't mock everything out in memory, I'd rather change multiTestContext so that all the tests use on-disk rocksdb instead of having a few tests work differently.

Before the changes, writeFileSyncing always writes to disk regardless its inMem or on disk RocksDB. However, since we want everything to be written to RocksDB's env, inMem env won't be able to write to disk so the checks in the tests will fail if we still use inMem RocksDB.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 11 unresolved discussions.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

Previously, windchan7 (Victor Chen) wrote…

It's mainly because the os.Mkdir from go only works for on disk RocksDB. For inMem RocksDB, it won't notice there's a directory on disk already. Instead, it will throw a directory does not exist error, or something like that.

What do you mean it won't notice there's a directory on disk already? Are the Go and C++ sides looking at the same path?


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

Previously, windchan7 (Victor Chen) wrote…

Before the changes, writeFileSyncing always writes to disk regardless its inMem or on disk RocksDB. However, since we want everything to be written to RocksDB's env, inMem env won't be able to write to disk so the checks in the tests will fail if we still use inMem RocksDB.

Why don't other tests in this file need this too? I don't like having a couple of tests with a 20-line copy-pasted block of boilerplate. We should either support these methods on InMem RocksDB engines or move multiTestContext to use on-disk rocksdb in all cases.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

windchan7 commented May 15, 2018

Commenting out TestReplicaLazyLoad for now because of this issue:
#25430

Will uncomment it when the issue is resolved before merging.

@windchan7
Copy link
Contributor Author

Review status: 2 of 18 files reviewed at latest revision, 11 unresolved discussions.


c-deps/libroach/engine.h, line 46 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

EnvOpenFile needs a double-pointer, but I think the rest of these should just be a single pointer.

Done.


c-deps/libroach/engine.cc, line 228 at r2 (raw file):

Previously, mberhault (marc) wrote…

nit: we tend to use the full word status for clarity.

Done.


c-deps/libroach/engine.cc, line 233 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What do you mean it won't notice there's a directory on disk already? Are the Go and C++ sides looking at the same path?

Done.


c-deps/libroach/engine.cc, line 238 at r2 (raw file):

Previously, mberhault (marc) wrote…

make sure to run make c-deps-fmt, or clang-format directly, or add clang-format to your editor save hook for .cc and .h files.

Done.


c-deps/libroach/engine.cc, line 269 at r2 (raw file):

Previously, mberhault (marc) wrote…

for all the if (!s.ok()) { return ToDBStatus(s); } return kSuccess; calls, you can just return ToDBStatus(s).

Changed at necessary places.


pkg/storage/client_raft_test.go, line 1142 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why don't other tests in this file need this too? I don't like having a couple of tests with a 20-line copy-pasted block of boilerplate. We should either support these methods on InMem RocksDB engines or move multiTestContext to use on-disk rocksdb in all cases.

I made all multiTestContext using on disk engines. Thanks for the suggestion.


pkg/storage/replica_sideload_test.go, line 111 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It would be nice to factor out some of the new repetition in this file.

Done.


pkg/storage/engine/engine.go, line 36 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We try not to duplicate all these headers in so many places. Also, C types are sometimes surprising in public interfaces. I think it would be best to define a type in rocksdb.go where the C stuff is already imported (especially with Nikhil's suggestion to put append and close on a new type).

Done.


pkg/storage/engine/rocksdb.go, line 2623 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Uncomment or remove this before merging.

Done.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

@mberhault Is it ready to go? I'm holding off merging it until the flaky test get resolved. But feel free to comment on the rest of it.

const rocksdb::EnvOptions soptions;
rocksdb::unique_ptr<rocksdb::WritableFile> rocksdb_file;

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed now.

if (!status.ok()) {
return ToDBStatus(status);
}
delete file;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file object still needs to be deleted on error from Close, or we'll be leaking memory.

@windchan7
Copy link
Contributor Author

@mberhault Updated. Thanks a lot!

@windchan7 windchan7 force-pushed the encryption branch 5 times, most recently from 4f6bbd1 to 4add11a Compare May 17, 2018 22:39
nvb added a commit to nvb/cockroach that referenced this pull request May 17, 2018
Fixes cockroachdb#25430.

Before this change, the `RaftTickInterval` in the test was so low that
node liveness only had a few milliseconds to perform updates. This
caused the test to be flaky, especially in cockroachdb#25281, which is slowing
down all tests by changing from in-memory stores to on-disk stores. By
bumping up the `RaftTickInterval` by an order of magnitude in the test,
we give node liveness much more time to perform updates.

Release note: None
craig bot pushed a commit that referenced this pull request May 18, 2018
25642: storage: bump RaftTickInterval in TestReplicaLazyLoad r=nvanbenschoten a=nvanbenschoten

Fixes #25430.

Before this change, the `RaftTickInterval` in the test was so low that
node liveness only had a few milliseconds to perform updates. This
caused the test to be flaky, especially in #25281, which is slowing
down all tests by changing from in-memory stores to on-disk stores. By
bumping up the `RaftTickInterval` by an order of magnitude in the test,
we give node liveness much more time to perform updates.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
`writeFileSyncing()` now will be able to write encrypted content to RocksDB's
env. This commit is part of `use encryption for all local disk usage (non-logs)`.

Issue: cockroachdb#19783.
Release note: None
@windchan7
Copy link
Contributor Author

@mberhault Now the flaky test is resolved.

Copy link
Contributor

@mberhault mberhault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@windchan7
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 18, 2018
25281: encryption: add support for encryption to `writeFileSyncing`. r=windchan7 a=windchan7

`writeFileSyncing()` now will be able to write encrypted content to RocksDB's
env. This commit is part of `use encryption for all local disk usage (non-logs)`.

Issue: #19783.
Release note: None

Co-authored-by: Victor Chen <victor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 18, 2018

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants