From 149d1c05ca6b910d3dccf439e3f11532630cf0de Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Wed, 19 Nov 2025 09:07:40 +0200 Subject: [PATCH 1/9] initial commit --- docs/docs/configuration.md | 8 +++ docs/docs/encryption.md | 115 +++++++++++++++++++++++++++++++++++++ docs/mkdocs.yml | 1 + 3 files changed, 124 insertions(+) create mode 100644 docs/docs/encryption.md diff --git a/docs/docs/configuration.md b/docs/docs/configuration.md index b97608e10985..d696f5279d16 100644 --- a/docs/docs/configuration.md +++ b/docs/docs/configuration.md @@ -90,6 +90,13 @@ Iceberg tables support table properties to configure table behavior, like the de | write.merge.isolation-level | serializable | Isolation level for merge commands: serializable or snapshot | | write.delete.granularity | partition | Controls the granularity of generated delete files: partition or file | +### Encryption properties + +| Property | Default | Description | +| --------------------------------- | ------------------ | ------------------------------------------------------------------------------------- | +| encryption.key-id | (not set) | ID of the master key of the table | +| encryption.data-key-length | 16 (bytes) | Length of keys used for encryption of table files. Valid values are 16, 24, 32 bytes | + ### Table behavior properties | Property | Default | Description | @@ -137,6 +144,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | cache-enabled | true | Whether to cache catalog entries | | cache.expiration-interval-ms | 30000 | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration | | metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details | +| encryption.kms-impl | null | a custom `KeyManagementClient` implementation to use in a catalog for interactions with KMS, a key management service) | `HadoopCatalog` and `HiveCatalog` can access the properties in their constructors. Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`. diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md new file mode 100644 index 000000000000..9bf45cfd089e --- /dev/null +++ b/docs/docs/encryption.md @@ -0,0 +1,115 @@ +--- +title: "Encryption" +--- + + +# Encryption + +Iceberg table encryption protects confidentiality and integrity of table data in an untrusted storage. The data, delete, manifest and manifest list files are encrypted and tamper-proofed before being sent to the storage backend. + +The `metadata.json` file does not contain confidential information, and is therefore not encrypted. + +Currently, table encryption is supported with the Hive and REST catalogs. + +Two parameters are required to activate encryption of a table, +1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). +2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. + + +## Example + +```sh +spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{ sparkVersionMajor }}:{{ icebergVersion }}\ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ + --conf spark.sql.catalog.spark_catalog.type=hive \ + --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.local.type=hive \ + --conf spark.sql.catalog.local.encryption.kms-impl=org.apache.iceberg.aws.AwsKeyManagementClient +``` + +```sql +CREATE TABLE local.db.table (id bigint, data string) USING iceberg +TBLPROPERTIES ('encryption.key-id'='{{ master key id }}'); +``` + +Inserted data will be automatically encrypted, + +```sql +INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c'); +``` + +To verify encryption, the contents of data, manifest and manifest list files can be dumped in the command line with + +```sh +hexdump -C {{ /path/to/file }} | more +``` + +The Parquet files must start with the "PARE" magic string (PARquet Encrypted footer mode), and manifest/list files must start with "AGS1" magic string (Aes Gcm Stream version 1). + +Queried data will be automatically decrypted, + +```sql +SELECT * FROM local.db.table; +``` + +## Security requirements + + +To function properly, Iceberg table encryption places the following requirements on the catalogs: + +1. For protection of table data confidentiality, the table encryption properties (`encryption.key-id` and an optional `encryption.data-key-length`) must be kept in a tamper-proof storage or in a trusted independent database. Catalogs must not retrieve these properties directly from the metadata.json, if this file is kept in a storage vulnerable to tampering. +2. For protection of table integrity, the metadata json must be kept in a tamper-proof storage or in a trusted independent object store. Catalogs must not retrieve the metadata.json file directly, if it is kept in a storage vulnerable to tampering. + +## Key Management Clients + +Currently, Iceberg has clients for the AWS and GCP KMS systems. A custom client can be built for other key management systems by implementing the `org.apache.iceberg.encryption.KeyManagementClient` interface. + +This interface has the following main methods, + +```java + /** + * Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID. + * Wrapping means encryption of the secret key with the master key, and adding optional + * KMS-specific metadata that allows the KMS to decrypt the secret key in an unwrapping call. + * + * @param key a secret key being wrapped + * @param wrappingKeyId a key ID that represents a wrapping key stored in KMS + * @return wrapped key material + */ + ByteBuffer wrapKey(ByteBuffer key, String wrappingKeyId); + + /** + * Unwrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an + * ID. + * + * @param wrappedKey wrapped key material (encrypted key and optional KMS metadata, returned by + * the wrapKey method) + * @param wrappingKeyId a key ID that represents a wrapping key stored in KMS + * @return raw key bytes + */ + ByteBuffer unwrapKey(ByteBuffer wrappedKey, String wrappingKeyId); + + /** + * Initialize the KMS client with given properties. + * + * @param properties kms client properties + */ + void initialize(Map properties); +``` + diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 13b187909208..c1807a6b8542 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -26,6 +26,7 @@ nav: - Tables: - branching.md - configuration.md + - encryption.md - evolution.md - maintenance.md - metrics-reporting.md From 663bd060e92c63a3ba13420b6587ddbd7408e8df Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Wed, 19 Nov 2025 09:50:43 +0200 Subject: [PATCH 2/9] clean up --- docs/docs/configuration.md | 2 +- docs/docs/encryption.md | 15 ++++++--------- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/docs/configuration.md b/docs/docs/configuration.md index d696f5279d16..f8be1a99a5e9 100644 --- a/docs/docs/configuration.md +++ b/docs/docs/configuration.md @@ -144,7 +144,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | cache-enabled | true | Whether to cache catalog entries | | cache.expiration-interval-ms | 30000 | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration | | metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details | -| encryption.kms-impl | null | a custom `KeyManagementClient` implementation to use in a catalog for interactions with KMS, a key management service) | +| encryption.kms-impl | null | a custom `KeyManagementClient` implementation to use in a catalog for interactions with KMS (key management service) | `HadoopCatalog` and `HiveCatalog` can access the properties in their constructors. Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`. diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 9bf45cfd089e..f0d77e118fc8 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -20,7 +20,7 @@ title: "Encryption" # Encryption -Iceberg table encryption protects confidentiality and integrity of table data in an untrusted storage. The data, delete, manifest and manifest list files are encrypted and tamper-proofed before being sent to the storage backend. +Iceberg table encryption protects confidentiality and integrity of table data in an untrusted storage. The `data`, `delete`, `manifest` and `manifest list` files are encrypted and tamper-proofed before being sent to the storage backend. The `metadata.json` file does not contain confidential information, and is therefore not encrypted. @@ -30,7 +30,6 @@ Two parameters are required to activate encryption of a table, 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). 2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. - ## Example ```sh @@ -68,17 +67,16 @@ Queried data will be automatically decrypted, SELECT * FROM local.db.table; ``` -## Security requirements - +## Security requirements To function properly, Iceberg table encryption places the following requirements on the catalogs: -1. For protection of table data confidentiality, the table encryption properties (`encryption.key-id` and an optional `encryption.data-key-length`) must be kept in a tamper-proof storage or in a trusted independent database. Catalogs must not retrieve these properties directly from the metadata.json, if this file is kept in a storage vulnerable to tampering. -2. For protection of table integrity, the metadata json must be kept in a tamper-proof storage or in a trusted independent object store. Catalogs must not retrieve the metadata.json file directly, if it is kept in a storage vulnerable to tampering. +1. For protection of table data confidentiality, the table encryption properties (`encryption.key-id` and an optional `encryption.data-key-length`) must be kept in a tamper-proof storage or in a trusted independent database. Catalogs must not retrieve these properties directly from the metadata.json, if this file is kept unprotected in a storage vulnerable to tampering. +2. For protection of table integrity, the metadata json must be kept in a tamper-proof storage or in a trusted independent object store. Catalogs must not retrieve the metadata.json file directly, if it is kept unprotected in a storage vulnerable to tampering. ## Key Management Clients -Currently, Iceberg has clients for the AWS and GCP KMS systems. A custom client can be built for other key management systems by implementing the `org.apache.iceberg.encryption.KeyManagementClient` interface. +Currently, Iceberg has clients for the AWS and GCP KMS systems. A custom client can be built for other key management systems by implementing the `org.apache.iceberg.encryption.KeyManagementClient` interface. This interface has the following main methods, @@ -108,8 +106,7 @@ This interface has the following main methods, /** * Initialize the KMS client with given properties. * - * @param properties kms client properties + * @param properties kms client properties (taken from catalog properties) */ void initialize(Map properties); ``` - From ae875c63e82cf30a7ef91dc8e4d07bd15f6e6d6c Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Wed, 26 Nov 2025 14:49:17 +0200 Subject: [PATCH 3/9] brief how it works section --- docs/docs/configuration.md | 4 ++- docs/docs/encryption.md | 54 +++++++++++++++++++++++++++++++------- 2 files changed, 47 insertions(+), 11 deletions(-) diff --git a/docs/docs/configuration.md b/docs/docs/configuration.md index f8be1a99a5e9..93348ef5e915 100644 --- a/docs/docs/configuration.md +++ b/docs/docs/configuration.md @@ -97,6 +97,8 @@ Iceberg tables support table properties to configure table behavior, like the de | encryption.key-id | (not set) | ID of the master key of the table | | encryption.data-key-length | 16 (bytes) | Length of keys used for encryption of table files. Valid values are 16, 24, 32 bytes | +See the [Encryption](encryption.md) document for additional details. + ### Table behavior properties | Property | Default | Description | @@ -144,7 +146,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | cache-enabled | true | Whether to cache catalog entries | | cache.expiration-interval-ms | 30000 | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration | | metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details | -| encryption.kms-impl | null | a custom `KeyManagementClient` implementation to use in a catalog for interactions with KMS (key management service) | +| encryption.kms-impl | null | a custom `KeyManagementClient` implementation to use in a catalog for interactions with KMS (key management service). See the [Encryption](encryption.md) document for additional details | `HadoopCatalog` and `HiveCatalog` can access the properties in their constructors. Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`. diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index f0d77e118fc8..1dc584dba7b2 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -22,14 +22,16 @@ title: "Encryption" Iceberg table encryption protects confidentiality and integrity of table data in an untrusted storage. The `data`, `delete`, `manifest` and `manifest list` files are encrypted and tamper-proofed before being sent to the storage backend. -The `metadata.json` file does not contain confidential information, and is therefore not encrypted. +The `metadata.json` file does not contain data or stats, and is therefore not encrypted. -Currently, table encryption is supported with the Hive and REST catalogs. +Currently, encryption is supported in the Hive and REST catalogs for tables with Parquet and Avro data formats. Two parameters are required to activate encryption of a table, 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). 2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. +For more details, see the "Appendix: How It Works" [subsection](#appendix:-how-it-works). + ## Example ```sh @@ -76,11 +78,18 @@ To function properly, Iceberg table encryption places the following requirements ## Key Management Clients -Currently, Iceberg has clients for the AWS and GCP KMS systems. A custom client can be built for other key management systems by implementing the `org.apache.iceberg.encryption.KeyManagementClient` interface. +Currently, Iceberg has clients for the AWS, GCP and Azure KMS systems. A custom client can be built for other key management systems by implementing the `org.apache.iceberg.encryption.KeyManagementClient` interface. This interface has the following main methods, ```java + /** + * Initialize the KMS client with given properties. + * + * @param properties kms client properties (taken from catalog properties) + */ + void initialize(Map properties); + /** * Wrap a secret key, using a wrapping/master key which is stored in KMS and referenced by an ID. * Wrapping means encryption of the secret key with the master key, and adding optional @@ -102,11 +111,36 @@ This interface has the following main methods, * @return raw key bytes */ ByteBuffer unwrapKey(ByteBuffer wrappedKey, String wrappingKeyId); - - /** - * Initialize the KMS client with given properties. - * - * @param properties kms client properties (taken from catalog properties) - */ - void initialize(Map properties); ``` + +## Appendix: How It Works + +The standard Iceberg encryption manager generates an encryption key and a unique file ID ("AAD prefix") +for each data and delete file. The generation is performed in the worker nodes, by using a secure random +number generator. For Parquet data files, these parameters are passed to the native Parquet Modular +Encryption [mechanism](https://parquet.apache.org/docs/file-format/data-pages/encryption). For Avro data files, +these parameters are passed to the AES GCM Stream encryption [mechanism](../../../format/gcm-stream-spec.md). + +The parent manifest file stores the encryption key and AAD prefix for each data and delete file in the +`key_metadata` field. For Avro data tables, the data file length is also added to the `key_metadata`. +The manifest file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an +AAD prefix generated by the standard encryption manager. The generation is performed in the driver nodes, +by using a secure random number generator. + +The parent manifest list file stores the encryption key, AAD prefix and file length for each manifest file +in the `key_metadata` field. The manifest list file is encrypted by the AES GCM Stream encryption mechanism, +using an encryption key and an AAD prefix generated by the standard encryption manager. + +The manifest list encryption key, AAD prefix and file length are packed in a key metadata object. This object +is serialized and encrypted with a "key encryption key" (KEK), using the KEK creation timestamp as the AES +GCM AAD. A KEK and its unique KEK_ID are generated by using a secure random number generator. For each +snapshot, the KEK_ID of the encryption key that encrypts the manifest list key metadata is kept in the +`key-id` field in the table metadata snapshot [structure](../../../format/spec.md#snapshots). The encrypted +manifest list key metadata is kept in the `encryption-keys` list in the table metadata +[structure](../../format/spec.md#table-metadata-fields). + +The KEK is encrypted by the table master key via the KMS client. The result is kept in the `encryption-keys` +list in the table metadata structure. The KEK is re-used for a period allowed by the NIST SP 800-57 +specification. Then, it is rotated - a new KEK and KEK_ID are generated for encryption of new manifest list +key metadata objects. The new KEK is encrypted by the table master key and stored in the `encryption-keys` +list in the table metadata structure. The previous KEKs are retained for the existing table snapshots. \ No newline at end of file From 11899b585c11c97f7471eac2b9fe226c0cc604dc Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Wed, 26 Nov 2025 15:04:38 +0200 Subject: [PATCH 4/9] clean up --- docs/docs/encryption.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 1dc584dba7b2..51c0e9b348f6 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -30,7 +30,7 @@ Two parameters are required to activate encryption of a table, 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). 2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. -For more details, see the "Appendix: How It Works" [subsection](#appendix:-how-it-works). +For more details, see the "Appendix: How It Works" [subsection](#appendix-how-it-works). ## Example @@ -119,13 +119,13 @@ The standard Iceberg encryption manager generates an encryption key and a unique for each data and delete file. The generation is performed in the worker nodes, by using a secure random number generator. For Parquet data files, these parameters are passed to the native Parquet Modular Encryption [mechanism](https://parquet.apache.org/docs/file-format/data-pages/encryption). For Avro data files, -these parameters are passed to the AES GCM Stream encryption [mechanism](../../../format/gcm-stream-spec.md). +these parameters are passed to the AES GCM Stream encryption [mechanism](../../format/gcm-stream-spec.md). The parent manifest file stores the encryption key and AAD prefix for each data and delete file in the `key_metadata` field. For Avro data tables, the data file length is also added to the `key_metadata`. The manifest file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager. The generation is performed in the driver nodes, -by using a secure random number generator. +by using a secure random number generator. The parent manifest list file stores the encryption key, AAD prefix and file length for each manifest file in the `key_metadata` field. The manifest list file is encrypted by the AES GCM Stream encryption mechanism, @@ -135,7 +135,7 @@ The manifest list encryption key, AAD prefix and file length are packed in a key is serialized and encrypted with a "key encryption key" (KEK), using the KEK creation timestamp as the AES GCM AAD. A KEK and its unique KEK_ID are generated by using a secure random number generator. For each snapshot, the KEK_ID of the encryption key that encrypts the manifest list key metadata is kept in the -`key-id` field in the table metadata snapshot [structure](../../../format/spec.md#snapshots). The encrypted +`key-id` field in the table metadata snapshot [structure](../../format/spec.md#snapshots). The encrypted manifest list key metadata is kept in the `encryption-keys` list in the table metadata [structure](../../format/spec.md#table-metadata-fields). @@ -143,4 +143,4 @@ The KEK is encrypted by the table master key via the KMS client. The result is k list in the table metadata structure. The KEK is re-used for a period allowed by the NIST SP 800-57 specification. Then, it is rotated - a new KEK and KEK_ID are generated for encryption of new manifest list key metadata objects. The new KEK is encrypted by the table master key and stored in the `encryption-keys` -list in the table metadata structure. The previous KEKs are retained for the existing table snapshots. \ No newline at end of file +list in the table metadata structure. The previous KEKs are retained for the existing table snapshots. From 471791a86afb2ba5f626b50e1a3141c2ae9370a8 Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Wed, 26 Nov 2025 15:58:05 +0200 Subject: [PATCH 5/9] add refs --- docs/docs/encryption.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 51c0e9b348f6..8bbc86c6fe17 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -122,13 +122,13 @@ Encryption [mechanism](https://parquet.apache.org/docs/file-format/data-pages/en these parameters are passed to the AES GCM Stream encryption [mechanism](../../format/gcm-stream-spec.md). The parent manifest file stores the encryption key and AAD prefix for each data and delete file in the -`key_metadata` field. For Avro data tables, the data file length is also added to the `key_metadata`. +`key_metadata` [field](../../format/spec.md#data-file-fields). For Avro data tables, the data file length is also added to the `key_metadata`. The manifest file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager. The generation is performed in the driver nodes, by using a secure random number generator. The parent manifest list file stores the encryption key, AAD prefix and file length for each manifest file -in the `key_metadata` field. The manifest list file is encrypted by the AES GCM Stream encryption mechanism, +in the `key_metadata` [field](../../format/spec.md#manifest-lists). The manifest list file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager. The manifest list encryption key, AAD prefix and file length are packed in a key metadata object. This object From bb6e846d44769859c3ec85b37888e02cd06a6b57 Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Thu, 27 Nov 2025 10:29:09 +0200 Subject: [PATCH 6/9] discussion updates --- docs/docs/encryption.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 8bbc86c6fe17..9eadf8e28a50 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -30,7 +30,9 @@ Two parameters are required to activate encryption of a table, 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). 2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. -For more details, see the "Appendix: How It Works" [subsection](#appendix-how-it-works). +The `encryption.key-id` must be set during the table creation, and never modified or removed during the table lifetime. + +For more details on table encryption, see the "Appendix: Internals Overview" [subsection](#appendix-internals-overview). ## Example @@ -69,12 +71,14 @@ Queried data will be automatically decrypted, SELECT * FROM local.db.table; ``` -## Security requirements - -To function properly, Iceberg table encryption places the following requirements on the catalogs: +## Catalog security requirements -1. For protection of table data confidentiality, the table encryption properties (`encryption.key-id` and an optional `encryption.data-key-length`) must be kept in a tamper-proof storage or in a trusted independent database. Catalogs must not retrieve these properties directly from the metadata.json, if this file is kept unprotected in a storage vulnerable to tampering. -2. For protection of table integrity, the metadata json must be kept in a tamper-proof storage or in a trusted independent object store. Catalogs must not retrieve the metadata.json file directly, if it is kept unprotected in a storage vulnerable to tampering. +To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata +directly from metadata.json files, if these file are kept unprotected in a storage vulnerable to tampering. +Catalogs may keep the metadata in a trusted independent object store. +Catalogs may work with metadata.json files in a tamper-proof storage. +Catalogs may use checksum techniques to verify integrity of metadata.json files in a storage vulnerable to tampering +(the checksums must be kept in a separate trusted storage). ## Key Management Clients @@ -113,7 +117,7 @@ This interface has the following main methods, ByteBuffer unwrapKey(ByteBuffer wrappedKey, String wrappingKeyId); ``` -## Appendix: How It Works +## Appendix: Internals Overview The standard Iceberg encryption manager generates an encryption key and a unique file ID ("AAD prefix") for each data and delete file. The generation is performed in the worker nodes, by using a secure random @@ -122,13 +126,15 @@ Encryption [mechanism](https://parquet.apache.org/docs/file-format/data-pages/en these parameters are passed to the AES GCM Stream encryption [mechanism](../../format/gcm-stream-spec.md). The parent manifest file stores the encryption key and AAD prefix for each data and delete file in the -`key_metadata` [field](../../format/spec.md#data-file-fields). For Avro data tables, the data file length is also added to the `key_metadata`. +`key_metadata` [field](../../format/spec.md#data-file-fields). For Avro data tables, the data file length +is also added to the `key_metadata`. The manifest file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager. The generation is performed in the driver nodes, by using a secure random number generator. The parent manifest list file stores the encryption key, AAD prefix and file length for each manifest file -in the `key_metadata` [field](../../format/spec.md#manifest-lists). The manifest list file is encrypted by the AES GCM Stream encryption mechanism, +in the `key_metadata` [field](../../format/spec.md#manifest-lists). The manifest list file is encrypted by +the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager. The manifest list encryption key, AAD prefix and file length are packed in a key metadata object. This object From 23379854ed9ca942c06eadccb89b79d6402c8580 Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Mon, 1 Dec 2025 10:43:00 +0200 Subject: [PATCH 7/9] address review comments --- docs/docs/encryption.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 9eadf8e28a50..83859ff29363 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -26,11 +26,9 @@ The `metadata.json` file does not contain data or stats, and is therefore not en Currently, encryption is supported in the Hive and REST catalogs for tables with Parquet and Avro data formats. -Two parameters are required to activate encryption of a table, +Two parameters are required to activate encryption of a table 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). -2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. - -The `encryption.key-id` must be set during the table creation, and never modified or removed during the table lifetime. +2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. This table property must be set during the table creation, and never modified or removed during the table lifetime. For more details on table encryption, see the "Appendix: Internals Overview" [subsection](#appendix-internals-overview). @@ -74,10 +72,11 @@ SELECT * FROM local.db.table; ## Catalog security requirements To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata -directly from metadata.json files, if these file are kept unprotected in a storage vulnerable to tampering. -Catalogs may keep the metadata in a trusted independent object store. -Catalogs may work with metadata.json files in a tamper-proof storage. -Catalogs may use checksum techniques to verify integrity of metadata.json files in a storage vulnerable to tampering +directly from metadata.json files, if these files are kept unprotected in a storage vulnerable to tampering. + +* Catalogs may keep the metadata in a trusted independent object store. +* Catalogs may work with metadata.json files in a tamper-proof storage. +* Catalogs may use checksum techniques to verify integrity of metadata.json files in a storage vulnerable to tampering (the checksums must be kept in a separate trusted storage). ## Key Management Clients From a50241cb22f253e895d89304611167552e47887f Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Mon, 1 Dec 2025 14:00:03 +0200 Subject: [PATCH 8/9] add ref to custom catalogs doc --- docs/docs/custom-catalog.md | 2 ++ docs/docs/encryption.md | 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/docs/custom-catalog.md b/docs/docs/custom-catalog.md index a812046589e8..f0a6b5718a6c 100644 --- a/docs/docs/custom-catalog.md +++ b/docs/docs/custom-catalog.md @@ -28,6 +28,8 @@ It's possible to read an iceberg table either from an hdfs path or from a hive t - [Custom LocationProvider](#custom-location-provider-implementation) - [Custom IcebergSource](#custom-icebergsource) +Note: To work with encrypted tables, custom catalogs must address a number of security [requirements](encryption.md#catalog-security-requirements). + ### Custom table operations implementation Extend `BaseMetastoreTableOperations` to provide implementation on how to read and write metadata diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 83859ff29363..834b5071b81b 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -28,7 +28,7 @@ Currently, encryption is supported in the Hive and REST catalogs for tables with Two parameters are required to activate encryption of a table 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). -2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. This table property must be set during the table creation, and never modified or removed during the table lifetime. +2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. For more details on table encryption, see the "Appendix: Internals Overview" [subsection](#appendix-internals-overview). @@ -71,7 +71,8 @@ SELECT * FROM local.db.table; ## Catalog security requirements -To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata +1. Catalogs must ensure the `encryption.key-id` property is not modified or removed during table lifetime. +2. To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata directly from metadata.json files, if these files are kept unprotected in a storage vulnerable to tampering. * Catalogs may keep the metadata in a trusted independent object store. From 391ee095f10914e1f61357c85bb12466c070a29d Mon Sep 17 00:00:00 2001 From: Gidon Gershinsky Date: Mon, 1 Dec 2025 14:02:42 +0200 Subject: [PATCH 9/9] add line break --- docs/docs/encryption.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/docs/encryption.md b/docs/docs/encryption.md index 834b5071b81b..0b694c44005b 100644 --- a/docs/docs/encryption.md +++ b/docs/docs/encryption.md @@ -72,6 +72,7 @@ SELECT * FROM local.db.table; ## Catalog security requirements 1. Catalogs must ensure the `encryption.key-id` property is not modified or removed during table lifetime. + 2. To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata directly from metadata.json files, if these files are kept unprotected in a storage vulnerable to tampering.