-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encryption integration and test #5544
Open
ggershinsky
wants to merge
20
commits into
apache:main
Choose a base branch
from
ggershinsky:integrate-and-test
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
0a0670e
rebase
ggershinsky 2a7c804
address review comments
ggershinsky bef73dc
cleanup
ggershinsky 4bb958f
enable null table metadata for old unitests
ggershinsky 636c233
remove encryption factory
ggershinsky 4547417
update
ggershinsky c2b355c
Metadata file encryption
ggershinsky 7a61a2e
spotlessApply
ggershinsky a59e3f5
clean up
ggershinsky 7c92d06
fix drop table call
ggershinsky 14e1f76
use EncryptingFileIO only for encrypted tables
ggershinsky 5ff8656
write length only for encrypted metadata
ggershinsky 3cbc0c2
encrypt manifest list keys with metadata key
ggershinsky 9d9a522
comments
ggershinsky 839b9cd
wrap snapshot key encryption key in KMS
ggershinsky a077b63
old return type in metadata write
ggershinsky e04a390
remove unnneeded change
ggershinsky 3afe781
kek cache
ggershinsky 453a8b9
address review comments
ggershinsky bd62067
visible for testing
ggershinsky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,7 @@ | |
import java.util.function.Predicate; | ||
import java.util.stream.Collectors; | ||
import java.util.stream.Stream; | ||
import org.apache.iceberg.encryption.KeyEncryptionKey; | ||
import org.apache.iceberg.exceptions.ValidationException; | ||
import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; | ||
import org.apache.iceberg.relocated.com.google.common.base.Objects; | ||
|
@@ -260,6 +261,7 @@ public String toString() { | |
private final List<PartitionStatisticsFile> partitionStatisticsFiles; | ||
private final List<MetadataUpdate> changes; | ||
private SerializableSupplier<List<Snapshot>> snapshotsSupplier; | ||
private Map<String, KeyEncryptionKey> kekCache; | ||
private volatile List<Snapshot> snapshots; | ||
private volatile Map<Long, Snapshot> snapshotsById; | ||
private volatile Map<String, SnapshotRef> refs; | ||
|
@@ -512,6 +514,14 @@ public List<Snapshot> snapshots() { | |
return snapshots; | ||
} | ||
|
||
public void setKekCache(Map<String, KeyEncryptionKey> kekCache) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll change this to |
||
this.kekCache = kekCache; | ||
} | ||
|
||
public Map<String, KeyEncryptionKey> kekCache() { | ||
return kekCache; | ||
} | ||
|
||
private synchronized void ensureSnapshotsLoaded() { | ||
if (!snapshotsLoaded) { | ||
List<Snapshot> loadedSnapshots = Lists.newArrayList(snapshotsSupplier.get()); | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,8 @@ | |
import org.apache.iceberg.catalog.Namespace; | ||
import org.apache.iceberg.catalog.SupportsNamespaces; | ||
import org.apache.iceberg.catalog.TableIdentifier; | ||
import org.apache.iceberg.encryption.EncryptionUtil; | ||
import org.apache.iceberg.encryption.KeyManagementClient; | ||
import org.apache.iceberg.exceptions.NamespaceNotEmptyException; | ||
import org.apache.iceberg.exceptions.NoSuchNamespaceException; | ||
import org.apache.iceberg.exceptions.NoSuchTableException; | ||
|
@@ -56,6 +58,7 @@ | |
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; | ||
import org.apache.iceberg.relocated.com.google.common.collect.Maps; | ||
import org.apache.iceberg.util.LocationUtil; | ||
import org.apache.iceberg.util.PropertyUtil; | ||
import org.apache.thrift.TException; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
@@ -76,9 +79,11 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa | |
private String name; | ||
private Configuration conf; | ||
private FileIO fileIO; | ||
private KeyManagementClient keyManagementClient; | ||
private ClientPool<IMetaStoreClient, TException> clients; | ||
private boolean listAllTables = false; | ||
private Map<String, String> catalogProperties; | ||
private long writerKekTimeout; | ||
|
||
public HiveCatalog() {} | ||
|
||
|
@@ -110,6 +115,15 @@ public void initialize(String inputName, Map<String, String> properties) { | |
? new HadoopFileIO(conf) | ||
: CatalogUtil.loadFileIO(fileIOImpl, properties, conf); | ||
|
||
if (catalogProperties.containsKey(CatalogProperties.ENCRYPTION_KMS_IMPL)) { | ||
this.keyManagementClient = EncryptionUtil.createKmsClient(properties); | ||
this.writerKekTimeout = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this.heartbeatIntervalMs =
PropertyUtil.propertyAsLong(
properties,
CatalogProperties.LOCK_HEARTBEAT_INTERVAL_MS,
CatalogProperties.LOCK_HEARTBEAT_INTERVAL_MS_DEFAULT); |
||
PropertyUtil.propertyAsLong( | ||
properties, | ||
CatalogProperties.WRITER_KEK_TIMEOUT_MS, | ||
CatalogProperties.WRITER_KEK_TIMEOUT_MS_DEFAULT); | ||
} | ||
|
||
this.clients = new CachedClientPool(conf, properties); | ||
} | ||
|
||
|
@@ -512,7 +526,8 @@ private boolean isValidateNamespace(Namespace namespace) { | |
public TableOperations newTableOps(TableIdentifier tableIdentifier) { | ||
String dbName = tableIdentifier.namespace().level(0); | ||
String tableName = tableIdentifier.name(); | ||
return new HiveTableOperations(conf, clients, fileIO, name, dbName, tableName); | ||
return new HiveTableOperations( | ||
conf, clients, fileIO, keyManagementClient, name, dbName, tableName, writerKekTimeout); | ||
} | ||
|
||
@Override | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not call
inputFile.getLength
because it will either return the length passed in above and is useless (newInputFile(path, length)
) or it will make a call to the underlying storage (needless HEAD request). Don't we already catch cases where the file has been truncated?Hm. I don't see a test in
TestGcmStreams
so we should probably add one that validates truncated streams specifically.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how. If a file is truncated by exactly 1 block, GCM Streams won't detect that. The upper layer (Avro or Json readers, etc) might detect that or might not, it's not guaranteed. That's why we've added an explicit requirement in the GCM Stream spec to take the length from a trusted source. This is "out of band" from the spec point of view, meaning that we must make sure the length comes from a parent metadata (and not from the file system) everywhere in Iceberg where we decrypt a stream.
However,
FileIO.newInputFile(path, length)
implementations are custom; some of them simply ignore the length parameter - and then indeed send a file system request uponInputFile.getLength()
call. But we can prevent a security breach by making this check (if (inputFile.getLength() != length)
). In most cases, this won't trigger a HEAD request, because most of FileIO implementations don't ignore the length parameter innewInputFile(path, length)
and store it. For those few that do ignore it, we need to verify the file length wasn't truncated in the file system.