Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption with Change Management #7020

Open
wants to merge 93 commits into
base: main
Choose a base branch
from

Conversation

matthewvon
Copy link
Contributor

@matthewvon matthewvon commented Jun 23, 2020

Encryption at rest is often an enterprise requirement. This PR provides an implementation that uses OpenSSLs AES 256 CTL encryption.

The code is written in such a way that the programmer can easily switch to a different library and/or encryption cipher. And the switch can happen later without requiring the end user to dump their entire database then reload under the new library / cipher. The encryption library is intentionally loaded at runtime, not compiled into rocksdb. This allows for situations such as HeartBleed where users want a quick update of security code without waiting for a new product release. They simply load the new library and restart the database.

The code also allows for encryption keys to change. There must be a "current key" for writing and reading. There can also be previous keys for reading older .sst files. This implies that the end user can migrate from older keys either by simply waiting for compactions to naturally replace older keys, or by forcing a compaction of all files. The latter might take time, but the database is still operational.

@matthewvon
Copy link
Contributor Author

I am staring at this error and it really looks like a compiler bug:

https://travis-ci.org/github/facebook/rocksdb/jobs/701431752

Open to making any fixes that I am missing.

@matthewvon
Copy link
Contributor Author

matthewvon commented Jun 23, 2020

@ajkr @mrambacher @yiwu-arbug As promised, here is the second encryption PR that exploits inheritance from #6830. Mark has already pointed out that part of the library_loader is already present Env. I will adjust to use it Wednesday.


bool Valid() const { return valid_; };
const Sha1Description_t& key_desc() const { return key_desc_; };
const AesCtrKey_t& key() const { return key_; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hit this all the time with gcc 4.8, which is why we added the build. Ctor param shadows this member function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger Thank you for taking time to explain. I can easily adapt the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is complete and passing tests now.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the EncryptedEnv interface and the continuing work. I learn a lot of crypto basics from the original EncryptedEnv implementation.

I have some questions. They are probably more related to the original EncryptedEnv, but since those logic are reused, I'd take the chance for discussion

  1. We are concatenating 8 bytes nonce and 8 bytes counter to form a 16 bytes openssl IV. This is incompatible with openssl format, where openssl use the whole 16 bytes as a 128-bits big endian integer counter. Do we consider making the format compatible with openssl, so potentially we can use other openssl compatible tools to encrypt/decrypt file? (of cause there's also the file prefix needs to be strip beforehand).
  2. We are encrypting (and decrypting) data block by block (the 16 bytes AES block), each time initiating openssl cipher again. Do we consider encrypting data in bulk. Say if rocksdb want to write 1MB into a file, we call EVP_EncryptInit_ex, EVP_EncryptUpdate and EVP_EncryptFinal set of functions only once? (actually maybe 3 times to take care of unaligned blocks at the begining and at the end). This depends on item 1. I don't have benchmark and not sure how much benefit it is, but at least it will save a lot of function calls.
  3. It is undocumented, but with CTR mode openssl seems to support encrypting in-place (use the same address for encryption input and output). Have we consider using the fact to reduce number of memcopy?

And some minor comments. Thanks.

include/rocksdb/env_encrypt2.h Outdated Show resolved Hide resolved
include/rocksdb/env_encrypt2.h Outdated Show resolved Hide resolved
include/rocksdb/env_encrypt2.h Outdated Show resolved Hide resolved
// Base class / interface
// expectation is to derive one class for unux and one for Windows
//
class LibraryLoader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have static link option?

Copy link
Contributor Author

@matthewvon matthewvon Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two ways to achieve static link. First is a conditional compile. Second is to exploit PosixEnv::LoadLibrary()'s ability to pull symbols from either external library or running executable. I prefer the latter. It creates one code path for both scenarios. It would also allow the user to ship with a "current" version and then override at runtime if an update version comes out with a critical fix. The feature would require an option setting. If you like this idea, would you suggest which options structure is most appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might still need a conditional compile to make sure the symbols exist in the executable. I am guessing that a link against .a instead of .so would not pull in the symbols/functions needed unless there is "code calling them". any thoughts here are welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with dylib loading, but it remind me Jemalloc allow to add prefixes to its methods so to not conflict with, say, libc. But if OpenSSL don't support that, maybe we just conditional compile to choose whether static link to dynamic link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openssl does not directly support prefixes with dylib loading. I can make both static+dynamic or dynamic only work via compile time flag. If static is not correctly identified as missing at compile time, the compile fails. A couple of recent test passes are suggesting the dynamic load might be having dependency problems in your test infrastructure (I have added some debug code to figure this out. Might have to make test for openssl more exhaustive in build_detect_platform). I am continuing to work this issue.


EVP_CIPHER_CTX *EVP_CIPHER_CTX_new(void) const { return cipher_new_(); };

void EVP_CIPHER_CTX_free(EVP_CIPHER_CTX *ctx) { cipher_free_(ctx); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EVP_CIPHER_CTX_free does not exist in openssl 1.0.2. Can we add a version check for openssl?

Copy link
Contributor Author

@matthewvon matthewvon Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. And it ticked me off. The code already compensates. Please see (and the 3 lines following):

ctx_free_ = (EVP_MD_CTX_free_t)functions_["EVP_MD_CTX_free"];

The code simply maps the ENV_CIPHER_CTX_destroy into the cipher_free_ pointer since parameters are same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSSL 1.0.2 doesn't seems to provide EVP_CIPHER_CTX_destroy as well. With 1.0.2 the ctx can only be created on stack. https://www.openssl.org/docs/man1.0.2/man3/EVP_EncryptInit_ex.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That would actually simplify the current code. Using thread_local to get around recoding a bunch of stuff. Keeps one context per thread ... which equates to one context per read or write call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pulled the source code to 1.0.1u. It has functions _new, _free, and _cleanup. I have pulled the source code to 1.0.2.u. It has the functions _new, _free, and _cleanup. Going to strongly suggest a documentation problem on their website. My code is using _new, _free, and _reset. ENV_CIPHER_CTX_reset() is a name change for _cleanup. Source code uses the two equivalently inside _free and _init. I believe the real action item here is to create code that will pull _cleanup as an alternative entry point to _reset.

I was happily looking at maybe using a stack based copy of ENV_CIPHER_CTX. There is no destructor and there are internal allocations that need clean up. _new & _free might be safer in the long run, though calling _cleanup/_reset when done with stack struct accomplishes the same. Looking at other alternatives too based upon review of ctrl128.c reference you posted elsewhere in this PR.

if (sizeof(marker) == marker_slice.size() &&
marker_slice.starts_with(Marker)) {
// code_version currently unused
uint8_t code_version = (uint8_t)marker_slice[7];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we verify code_version is indeed 0, or return error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for ReadSeqEncryptionPrefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops ... will fix.

env/env_encrypt2.cc Outdated Show resolved Hide resolved
@matthewvon
Copy link
Contributor Author

@yiwu-arbug Per your item one about 16 versus 32 byte IV. The NIST standard and samples use 16 byte IV AES CTR 256. The code is currently verified against that standard. I have two sources that say 16 byte/128bit is proper IV for AES 256:

The encrypt/decrypt samples from the second document are part of env_encrypt2_test.cc.

Do you have a reference that I can review? There is nothing to prevent a coder from implementing a different cipher with different IV size. In fact, the purpose of env_encrypt2's design is to enable and encourage those changes and differences.

@matthewvon
Copy link
Contributor Author

matthewvon commented Jun 25, 2020

@yiwu-arbug Can you point me to source code or such for item three's update in place?

update: I believe this confirms your "update in place" for software AES implementation.

https://github.com/openssl/openssl/blob/master/crypto/aes/aes_core.c

Searching now for confirmation when hardware AES used (I know it is likely true, but feel obligated to verify).

update 2: it looks to me that the memmove within EncryptedWritableFile::Append() is related to getting data for encryption aligned with offsets within files. Once the addresses and offsets are aligned on proper boundaries, encryption/decryption is an "update in place". The shifting is about alignment.

update 3: ok, I understand now how to eliminate the need for alignment of buffers and offsets for CTR mode. I do not know what other derived code might be depending upon the alignment. Therefore, I am going to copy those routines from EncryptedEnv into EnctypedEnv2 for modification. This should also simplify reuse of the EVP_CIPHER_CTX object.

update 4: hold the phone. the memmove is because we are handed a const buffer from the caller. we do not know the impact of encrypt in place to the caller. so the memmove is to create a copy that we then encrypt in place. the current EncryptionEnv is clean. The alignment stuff is purely a side benefit ... and likely required for use_direct_io.

@matthewvon
Copy link
Contributor Author

@mrambacher I have converted code to PosixEnv::LoadLibrary(). That revealed a bug in env_posix.cc where constants used for library suffixes within std::string might not have initialized by the time LoadLibrary is called. Switched those three constants to static const char * so that compiler initializes, not loader.

@matthewvon
Copy link
Contributor Author

As of this moment, all checks have again passed. At least two of the recent failures were infrastructure related ... did not check 178cf31.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should also be made to work with CMake and on Windows (if possible).

Comment on lines 38 to 39
thread_local static std::unique_ptr<EVP_CIPHER_CTX, void (*)(EVP_CIPHER_CTX*)>
aes_context(nullptr, &do_nothing);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the util/thread_local class?

Comment on lines 35 to 38
struct ShaDescription {
uint8_t desc[EVP_MAX_MD_SIZE];
bool valid;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this class/struct is still required in the public API?

Comment on lines 81 to 83
struct AesCtrKey {
uint8_t key[EVP_MAX_KEY_LENGTH];
bool valid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class still required in the public API?

Comment on lines 123 to 124
class OpenSSLEncryptionProvider : public EncryptionProvider {
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class still required in the public API?

Comment on lines 79 to 80
class EncryptedWritableFileV2 : public EncryptedWritableFile {
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class still required?


#pragma once

#ifdef ROCKSDB_OPENSSL_AES_CTR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required in this header (along with the openssl includes)?

@@ -0,0 +1,180 @@
// copyright (c) 2011-present, Facebook, Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file still be in the public API?

Comment on lines 37 to 38
size_t LibraryLoader::GetEntryPoints(std::map<std::string, void*>& functions) {
size_t num_found{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this class and have a "LoadSymbols API" off the Dynamic Library class?

encrypt_init_ = (EVP_EncryptInit_ex_t)functions_["EVP_EncryptInit_ex"];
aes_256_ctr_ = (EVP_aes_256_ctr_t)functions_["EVP_aes_256_ctr"];
encrypt_update_ = (EVP_EncryptUpdate_t)functions_["EVP_EncryptUpdate"];
encrypt_final_ = (EVP_EncryptFinal_ex_t)functions_["EVP_EncryptFinal_ex"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of these allowed to be NULL? Is there an advantage to doing it as LoadSymbols versus a direct call (LoadSymbol) with a better check?

Comment on lines 55 to 56
UnixLibCrypto::UnixLibCrypto() : LibraryLoader(crypto_lib_name_) {
if (is_valid_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep this in a separate file rather than in the env_openssl file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants