-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard key manager #6884
Standard key manager #6884
Conversation
public static final String ENCRYPTION_DEK_LENGTH = "encryption.data.key.length"; | ||
public static final int ENCRYPTION_DEK_LENGTH_DEFAULT = 16; | ||
|
||
public static final int ENCRYPTION_AAD_LENGTH_DEFAULT = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these to EncryptionProperties
? We like to keep constants in separate classes like that.
import org.apache.iceberg.util.PropertyUtil; | ||
|
||
public class DefaultEncryptionManager implements EncryptionManager { | ||
public static final String ENCRYPTION_TABLE_KEY = "encryption.table.key.id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also consider what we want these to be named (ping @aokolnychyi). Something like encryption.key-id
?
import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
import org.apache.iceberg.util.PropertyUtil; | ||
|
||
public class DefaultEncryptionManager implements EncryptionManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a name for this bedsides "Default"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Envelope? Iceberg? Other suggestions are welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "Standard"?
public static final int ENCRYPTION_AAD_LENGTH_DEFAULT = 16; | ||
|
||
/** Implementation of the KMS client for envelope encryption */ | ||
public static final String ENCRYPTION_KMS_CLIENT_IMPL = "encryption.kms.client-impl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this too.
@Override | ||
public EncryptedOutputFile encrypt(OutputFile rawOutput) { | ||
if (null == workerRNG) { | ||
workerRNG = new SecureRandom(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this to a private method.
} | ||
|
||
@Override | ||
public InputFile decrypt(EncryptedInputFile encrypted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should doc that this is a Noop method, otherwise it's a bit confusing that by default this decrypt method just returns an encrypted file.
} | ||
|
||
private void createSecureRandomGenerator() { | ||
workerRNG = new SecureRandom(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When setting an instance field, use the this.
prefix to make it clear this is affecting persistent state.
|
||
KeyMetadata fileEnvelopeMetadata = new KeyMetadata(fileDek, null, aadPrefix); | ||
|
||
return new BaseEncryptedOutputFile(rawOutput, fileEnvelopeMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this actually needs to encrypt. BaseEncryptedOutputFile
doesn't actually do anything, but the intent is for this to be a file that will encrypt the stream of data as it is written to the stream created by encryptingOutputFile()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the comment below, I've implemented the following: EncryptedOutputFile class would returns the underlying stream as well, so that encryption can be handled by Parquet.
} | ||
|
||
// Native decryption: simply return the input file. Parquet decryption will get the key from key | ||
// metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't don't think this is correct.
These interfaces are intended to give you the option to encrypt/decrypt a stream. That's why they expose key metadata and an InputFile
or OutputFile
. You use the InputFile
as you normally would, but it is decrypted for you as you read.
That fits with the AES GCM encryption spec. This should decrypt using AES GCM decryption. Similarly, encryption should return an encrypting stream.
For native Parquet encryption, I think the EncryptedOutputFile
and EncryptedInputFile
classes would need to be able to return the underlying stream as well, so that encryption can be handled by Parquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the EncryptedOutputFile. For decryption, we have everything already in the Parquet reader: the underlying stream and the key metadata, so can decrypt with native Parquet.
(no need in changes in EncryptedInputFile, and no need in calling the encryption manager).
@@ -25,6 +25,8 @@ | |||
import org.slf4j.LoggerFactory; | |||
|
|||
public class PlaintextEncryptionManager implements EncryptionManager { | |||
public static final EncryptionManager INSTANCE = new PlaintextEncryptionManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we typically hide these behind static instance
methods, rather than making the instance field public.
String tableKeyId, KeyManagementClient kmsClient, Map<String, String> encryptionProperties) { | ||
Preconditions.checkNotNull( | ||
tableKeyId, | ||
"Cannot create EnvelopeEncryptionManager because table encryption key ID is not specified"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to state what was attempted. A shorter version would be more clear: "Invalid encryption key ID: null".
tableKeyId, | ||
"Cannot create EnvelopeEncryptionManager because table encryption key ID is not specified"); | ||
Preconditions.checkNotNull( | ||
kmsClient, "Cannot create EnvelopeEncryptionManager because kmsClient is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid KMS client: null
We don't use variable or class names in error messages. That's friendlier for people and anyone that will go to the file and line will quickly see what the variable name was anyway.
kmsClient, "Cannot create EnvelopeEncryptionManager because kmsClient is null"); | ||
Preconditions.checkNotNull( | ||
encryptionProperties, | ||
"Cannot create EnvelopeEncryptionManager because encryptionProperties are not passed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid encryption properties: null
.
|
||
@Override | ||
public EncryptedOutputFile encrypt(OutputFile rawOutput) { | ||
if (null == workerRNG) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually have a method for lazy initialized variables, like lazyRNG()
rather than doing the null check everywhere.
|
||
private EncryptionProperties() {} | ||
|
||
public static final String ENCRYPTION_TABLE_KEY = "encryption.table.key.id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names don't conform to the guidelines for properties: https://iceberg.apache.org/contribute/#config-naming
KeyMetadata dataEncryptionMetadata = new KeyMetadata(fileDek, null, aadPrefix); | ||
|
||
// For metadata files | ||
// This is an expensive operation, RPC to KMS server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a design gap we need to discuss. This comment, and a couple more below, are a temp text that describes the problem. TBD in our next call.
ByteBuffer wrappedDek = kmsClient.wrapKey(fileDek, tableKeyId); | ||
KeyMetadata manifestEncryptionMetadata = new KeyMetadata(wrappedDek, tableKeyId, aadPrefix); | ||
|
||
// We don't know which key metadata to use, because we don't know what file we encrypt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Iceberg, we don't use personal pronouns, like "we" or "I" in the code. It makes docs or comments longer without adding value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. As above , this is a temp text, to be removed.
// For metadata files | ||
// This is an expensive operation, RPC to KMS server | ||
ByteBuffer wrappedDek = kmsClient.wrapKey(fileDek, tableKeyId); | ||
KeyMetadata manifestEncryptionMetadata = new KeyMetadata(wrappedDek, tableKeyId, aadPrefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is wrapping being done here if it is an expensive operation? Doesn't the caller need the key directly? I think it would make more sense to wrap the key when it is stored.
public InputFile decrypt(EncryptedInputFile encrypted) { | ||
if (encrypted.keyMetadata() == null || encrypted.keyMetadata().buffer() == null) { | ||
throw new RuntimeException( | ||
"Unencrypted file " + encrypted.encryptedInputFile().location() + " in encrypted table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't use RuntimeException
in general. Is there a better one?
Also, is this an exception? What happens when converting a table to use encryption? It seems like this could be a valid state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, due to misconfiguration etc - users think they write encrypted tables / files, but the result is not encrypted. So this is a rather important security check. But we can discuss the options and trade-offs between this and a flexibility to be able to convert an existing table to use encryption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check I've mentioned, is not always possible to perform. Switching to your suggestion.
|
||
this.dataKeyLength = | ||
PropertyUtil.propertyAsInt( | ||
encryptionProperties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than passing encryption properties, why not just pass the key length?
import java.util.Map; | ||
import org.apache.iceberg.TableMetadata; | ||
|
||
public interface EncryptionManagerFactory extends Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of how this is used in context? I want to make sure what we're adding to catalogs or table implementations makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, in https://github.com/apache/iceberg/pull/5544/files
( eg https://github.com/apache/iceberg/pull/5544/files#diff-dcb8898b53c9729e27fea26093e90a05e86cc2682a03210cf5f56fd5a260f174R237 and https://github.com/apache/iceberg/pull/5544/files#diff-b840ad8566560883268ebb23aa1de123a05599a1aeeda31bf66dfc0cb54a7e82R194 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, it's an object in TableOperations, that allows up to create an EncryptionManager from both catalog properties and table metadata.
|
||
@Override | ||
public EncryptionManager create(TableMetadata tableMetadata) { | ||
if (tableMetadata == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be an exception. If table metadata is null, then this doesn't know what to produce.
|
||
if (tableMetadata.formatVersion() < 2) { | ||
throw new UnsupportedOperationException( | ||
"Iceberg encryption works only with table format 2 or higher"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key_metadata field for manifest files was added in v2, as far as I know.
final Map<String, String> encryptionProperties = Maps.newHashMap(); | ||
encryptionProperties.putAll(tableProperties); | ||
|
||
// Important: put catalog properties after table properties. Former overrides the latter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there properties that are not allowed in the catalog properties? For example, table key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In certain usecases and threat models (where an attacker can remove the table key from the metadata.json file), it'd be good to set the table key in the catalog properties - so the writer won't start producing unencrypted data files.
|
||
if (FileFormat.fromString(fileFormat) != FileFormat.PARQUET) { | ||
throw new UnsupportedOperationException( | ||
"Iceberg encryption currently supports only parquet format for data files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to also support Avro files? Or is that a future feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, preferably a future PR.
Co-Authored-By: Jian Tang <jian_tang@apple.com>
4823e92
to
aa3e562
Compare
core/src/main/java/org/apache/iceberg/encryption/PlaintextEncryptionManager.java
Show resolved
Hide resolved
@@ -37,4 +37,9 @@ public interface EncryptedOutputFile { | |||
* #encryptingOutputFile()}. | |||
*/ | |||
EncryptionKeyMetadata keyMetadata(); | |||
|
|||
/** Underlying output file for native encryption. */ | |||
default OutputFile rawOutputFile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this name, but I've had a hard time coming up with a better one. The best I can come up with is plainOutputFile
. What do you think about that? I think it's slightly clearer than "raw".
It also looks like this isn't used in this PR, in which case I think we should include it in a PR where it is used.
return KeyMetadata.parse(metadataBuffer); | ||
} | ||
|
||
public static EncryptionKeyMetadata createKeyMetadata(ByteBuffer key, ByteBuffer aadPrefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used and it looks like a few places use the KeyMetadata constructor directly.
@Override | ||
public Iterable<InputFile> decrypt(Iterable<EncryptedInputFile> encrypted) { | ||
// Bulk decrypt is only applied to data files. Returning source input files for parquet. | ||
return Iterables.transform(encrypted, this::getSourceFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. Based on the comment, I think the intent was to skip creating an AesGcmInputFile
because you know that this is only called for data files. But that's a bad assumption and there is no guarantee it will be called that way in the future.
Instead, I think this should call decrypt
. That operation should be cheap, although we may need to defer parsing the key metadata until later.
@@ -57,5 +57,9 @@ public static EncryptedOutputFile encryptedOutput( | |||
encryptedOutputFile, BaseEncryptionKeyMetadata.fromByteArray(keyMetadata)); | |||
} | |||
|
|||
public static EncryptedOutputFile plainAsEncryptedOutput(OutputFile encryptingOutputFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used. Can we add it when the use is added?
|
||
private EncryptionUtil() {} | ||
|
||
public static KeyMetadata parseKeyMetadata(ByteBuffer metadataBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used and duplicates the wrapped method. I think we should remove it.
return new KeyMetadata(key, aadPrefix); | ||
} | ||
|
||
public static long gcmEncryptionLength(long plainFileLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this since it isn't used? I'd rather not have dead and untested code.
"Cannot create StandardEncryptionManagerFactory without KMS type"); | ||
} | ||
|
||
if (!kmsType.equals(CatalogProperties.ENCRYPTION_KMS_CUSTOM_TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is much simpler if rather than requiring a type that is "custom", this instead checks that either type or impl is set.
@ggershinsky, I opened https://github.com/ggershinsky/iceberg/pull/9/files against your branch with the last changes I think are needed to get this in. Please review and let me know so we can get to the next one. Thanks! |
Merged in #9277! Thanks, @ggershinsky! |
No description provided.