Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard key manager #6884

Closed
wants to merge 13 commits into from
Closed

Conversation

ggershinsky
Copy link
Contributor

@ggershinsky ggershinsky commented Feb 20, 2023

No description provided.

@github-actions github-actions bot added the core label Feb 20, 2023
public static final String ENCRYPTION_DEK_LENGTH = "encryption.data.key.length";
public static final int ENCRYPTION_DEK_LENGTH_DEFAULT = 16;

public static final int ENCRYPTION_AAD_LENGTH_DEFAULT = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to EncryptionProperties? We like to keep constants in separate classes like that.

import org.apache.iceberg.util.PropertyUtil;

public class DefaultEncryptionManager implements EncryptionManager {
public static final String ENCRYPTION_TABLE_KEY = "encryption.table.key.id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider what we want these to be named (ping @aokolnychyi). Something like encryption.key-id?

import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.util.PropertyUtil;

public class DefaultEncryptionManager implements EncryptionManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a name for this bedsides "Default"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envelope? Iceberg? Other suggestions are welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "Standard"?

public static final int ENCRYPTION_AAD_LENGTH_DEFAULT = 16;

/** Implementation of the KMS client for envelope encryption */
public static final String ENCRYPTION_KMS_CLIENT_IMPL = "encryption.kms.client-impl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this too.

@Override
public EncryptedOutputFile encrypt(OutputFile rawOutput) {
if (null == workerRNG) {
workerRNG = new SecureRandom();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to a private method.

}

@Override
public InputFile decrypt(EncryptedInputFile encrypted) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should doc that this is a Noop method, otherwise it's a bit confusing that by default this decrypt method just returns an encrypted file.

@ggershinsky ggershinsky changed the title Core: Default key manager Core: Default key manager and factory Mar 8, 2023
}

private void createSecureRandomGenerator() {
workerRNG = new SecureRandom();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting an instance field, use the this. prefix to make it clear this is affecting persistent state.


KeyMetadata fileEnvelopeMetadata = new KeyMetadata(fileDek, null, aadPrefix);

return new BaseEncryptedOutputFile(rawOutput, fileEnvelopeMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually needs to encrypt. BaseEncryptedOutputFile doesn't actually do anything, but the intent is for this to be a file that will encrypt the stream of data as it is written to the stream created by encryptingOutputFile()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment below, I've implemented the following: EncryptedOutputFile class would returns the underlying stream as well, so that encryption can be handled by Parquet.

}

// Native decryption: simply return the input file. Parquet decryption will get the key from key
// metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't don't think this is correct.

These interfaces are intended to give you the option to encrypt/decrypt a stream. That's why they expose key metadata and an InputFile or OutputFile. You use the InputFile as you normally would, but it is decrypted for you as you read.

That fits with the AES GCM encryption spec. This should decrypt using AES GCM decryption. Similarly, encryption should return an encrypting stream.

For native Parquet encryption, I think the EncryptedOutputFile and EncryptedInputFile classes would need to be able to return the underlying stream as well, so that encryption can be handled by Parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the EncryptedOutputFile. For decryption, we have everything already in the Parquet reader: the underlying stream and the key metadata, so can decrypt with native Parquet.
(no need in changes in EncryptedInputFile, and no need in calling the encryption manager).

@@ -25,6 +25,8 @@
import org.slf4j.LoggerFactory;

public class PlaintextEncryptionManager implements EncryptionManager {
public static final EncryptionManager INSTANCE = new PlaintextEncryptionManager();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically hide these behind static instance methods, rather than making the instance field public.

String tableKeyId, KeyManagementClient kmsClient, Map<String, String> encryptionProperties) {
Preconditions.checkNotNull(
tableKeyId,
"Cannot create EnvelopeEncryptionManager because table encryption key ID is not specified");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to state what was attempted. A shorter version would be more clear: "Invalid encryption key ID: null".

tableKeyId,
"Cannot create EnvelopeEncryptionManager because table encryption key ID is not specified");
Preconditions.checkNotNull(
kmsClient, "Cannot create EnvelopeEncryptionManager because kmsClient is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid KMS client: null

We don't use variable or class names in error messages. That's friendlier for people and anyone that will go to the file and line will quickly see what the variable name was anyway.

kmsClient, "Cannot create EnvelopeEncryptionManager because kmsClient is null");
Preconditions.checkNotNull(
encryptionProperties,
"Cannot create EnvelopeEncryptionManager because encryptionProperties are not passed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid encryption properties: null.


@Override
public EncryptedOutputFile encrypt(OutputFile rawOutput) {
if (null == workerRNG) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually have a method for lazy initialized variables, like lazyRNG() rather than doing the null check everywhere.


private EncryptionProperties() {}

public static final String ENCRYPTION_TABLE_KEY = "encryption.table.key.id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names don't conform to the guidelines for properties: https://iceberg.apache.org/contribute/#config-naming

KeyMetadata dataEncryptionMetadata = new KeyMetadata(fileDek, null, aadPrefix);

// For metadata files
// This is an expensive operation, RPC to KMS server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment helpful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a design gap we need to discuss. This comment, and a couple more below, are a temp text that describes the problem. TBD in our next call.

ByteBuffer wrappedDek = kmsClient.wrapKey(fileDek, tableKeyId);
KeyMetadata manifestEncryptionMetadata = new KeyMetadata(wrappedDek, tableKeyId, aadPrefix);

// We don't know which key metadata to use, because we don't know what file we encrypt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Iceberg, we don't use personal pronouns, like "we" or "I" in the code. It makes docs or comments longer without adding value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. As above , this is a temp text, to be removed.

// For metadata files
// This is an expensive operation, RPC to KMS server
ByteBuffer wrappedDek = kmsClient.wrapKey(fileDek, tableKeyId);
KeyMetadata manifestEncryptionMetadata = new KeyMetadata(wrappedDek, tableKeyId, aadPrefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is wrapping being done here if it is an expensive operation? Doesn't the caller need the key directly? I think it would make more sense to wrap the key when it is stored.

public InputFile decrypt(EncryptedInputFile encrypted) {
if (encrypted.keyMetadata() == null || encrypted.keyMetadata().buffer() == null) {
throw new RuntimeException(
"Unencrypted file " + encrypted.encryptedInputFile().location() + " in encrypted table");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't use RuntimeException in general. Is there a better one?

Also, is this an exception? What happens when converting a table to use encryption? It seems like this could be a valid state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, due to misconfiguration etc - users think they write encrypted tables / files, but the result is not encrypted. So this is a rather important security check. But we can discuss the options and trade-offs between this and a flexibility to be able to convert an existing table to use encryption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check I've mentioned, is not always possible to perform. Switching to your suggestion.


this.dataKeyLength =
PropertyUtil.propertyAsInt(
encryptionProperties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing encryption properties, why not just pass the key length?

@ggershinsky ggershinsky changed the title Core: Default key manager and factory Core: Standard key manager and factory Jul 23, 2023
@ggershinsky ggershinsky changed the title Core: Standard key manager and factory Standard key manager and factory Aug 14, 2023
import java.util.Map;
import org.apache.iceberg.TableMetadata;

public interface EncryptionManagerFactory extends Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of how this is used in context? I want to make sure what we're adding to catalogs or table implementations makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, it's an object in TableOperations, that allows up to create an EncryptionManager from both catalog properties and table metadata.


@Override
public EncryptionManager create(TableMetadata tableMetadata) {
if (tableMetadata == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be an exception. If table metadata is null, then this doesn't know what to produce.


if (tableMetadata.formatVersion() < 2) {
throw new UnsupportedOperationException(
"Iceberg encryption works only with table format 2 or higher");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key_metadata field for manifest files was added in v2, as far as I know.

final Map<String, String> encryptionProperties = Maps.newHashMap();
encryptionProperties.putAll(tableProperties);

// Important: put catalog properties after table properties. Former overrides the latter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there properties that are not allowed in the catalog properties? For example, table key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In certain usecases and threat models (where an attacker can remove the table key from the metadata.json file), it'd be good to set the table key in the catalog properties - so the writer won't start producing unencrypted data files.


if (FileFormat.fromString(fileFormat) != FileFormat.PARQUET) {
throw new UnsupportedOperationException(
"Iceberg encryption currently supports only parquet format for data files");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also support Avro files? Or is that a future feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, preferably a future PR.

@ggershinsky ggershinsky changed the title Standard key manager and factory Standard key manager Sep 21, 2023
@github-actions github-actions bot added the API label Oct 16, 2023
@@ -37,4 +37,9 @@ public interface EncryptedOutputFile {
* #encryptingOutputFile()}.
*/
EncryptionKeyMetadata keyMetadata();

/** Underlying output file for native encryption. */
default OutputFile rawOutputFile() {
Copy link
Contributor

@rdblue rdblue Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this name, but I've had a hard time coming up with a better one. The best I can come up with is plainOutputFile. What do you think about that? I think it's slightly clearer than "raw".

It also looks like this isn't used in this PR, in which case I think we should include it in a PR where it is used.

return KeyMetadata.parse(metadataBuffer);
}

public static EncryptionKeyMetadata createKeyMetadata(ByteBuffer key, ByteBuffer aadPrefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used and it looks like a few places use the KeyMetadata constructor directly.

@Override
public Iterable<InputFile> decrypt(Iterable<EncryptedInputFile> encrypted) {
// Bulk decrypt is only applied to data files. Returning source input files for parquet.
return Iterables.transform(encrypted, this::getSourceFile);
Copy link
Contributor

@rdblue rdblue Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Based on the comment, I think the intent was to skip creating an AesGcmInputFile because you know that this is only called for data files. But that's a bad assumption and there is no guarantee it will be called that way in the future.

Instead, I think this should call decrypt. That operation should be cheap, although we may need to defer parsing the key metadata until later.

@@ -57,5 +57,9 @@ public static EncryptedOutputFile encryptedOutput(
encryptedOutputFile, BaseEncryptionKeyMetadata.fromByteArray(keyMetadata));
}

public static EncryptedOutputFile plainAsEncryptedOutput(OutputFile encryptingOutputFile) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used. Can we add it when the use is added?


private EncryptionUtil() {}

public static KeyMetadata parseKeyMetadata(ByteBuffer metadataBuffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used and duplicates the wrapped method. I think we should remove it.

return new KeyMetadata(key, aadPrefix);
}

public static long gcmEncryptionLength(long plainFileLength) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this since it isn't used? I'd rather not have dead and untested code.

"Cannot create StandardEncryptionManagerFactory without KMS type");
}

if (!kmsType.equals(CatalogProperties.ENCRYPTION_KMS_CUSTOM_TYPE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is much simpler if rather than requiring a type that is "custom", this instead checks that either type or impl is set.

@rdblue
Copy link
Contributor

rdblue commented Dec 10, 2023

@ggershinsky, I opened https://github.com/ggershinsky/iceberg/pull/9/files against your branch with the last changes I think are needed to get this in. Please review and let me know so we can get to the next one. Thanks!

@rdblue
Copy link
Contributor

rdblue commented Dec 12, 2023

Merged in #9277! Thanks, @ggershinsky!

@rdblue rdblue closed this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants