Conversation
|
cc @huaxingao |
docs/docs/encryption.md
Outdated
| To function properly, Iceberg table encryption places the following requirements on the catalogs: | ||
|
|
||
| 1. For protection of table data confidentiality, the table encryption properties (`encryption.key-id` and an optional `encryption.data-key-length`) must be kept in a tamper-proof storage or in a trusted independent database. Catalogs must not retrieve these properties directly from the metadata.json, if this file is kept unprotected in a storage vulnerable to tampering. | ||
| 2. For protection of table integrity, the metadata json must be kept in a tamper-proof storage or in a trusted independent object store. Catalogs must not retrieve the metadata.json file directly, if it is kept unprotected in a storage vulnerable to tampering. |
There was a problem hiding this comment.
Thank you for this section! It is very helpful in understanding motivations
Realise this may still be in-progress, but just flagging that I'm a bit confused from an IRC implementation perspective about what's required here.
I gather from (2) and the devlist thread that for untrusted storage (I'd imagine most IRCs currently write metadata JSONs in the same object storage as data files), an IRC should tamper-check a metadata JSON before returning its TableMetadata spec object - however, it then feels that (1) is not necessary anymore.
There was a problem hiding this comment.
Yep, this can be simplified.
docs/docs/encryption.md
Outdated
| 1. Catalog property `encryption.kms-impl`, that specifies the class path for a client of a KMS ("key management service"). | ||
| 2. Table property `encryption.key-id`, that specifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS. | ||
|
|
||
| The `encryption.key-id` must be set during the table creation, and never modified or removed during the table lifetime. |
There was a problem hiding this comment.
Should rest_spec.md also be updated to ensure this requirement in order to prevent accidents?
There was a problem hiding this comment.
This requirement is applicable to all catalogs. Per our recent community discussion, the rest encryption updates should be rest-specific.
There was a problem hiding this comment.
Is there a document/spec which tracks how catalogs should behave .
I would prefer to avoid accidents like these and any custom catalogue to know these behaviours exist .
There was a problem hiding this comment.
we can add a pointer to the catalog security requirements section to the custom-catalogs.md.
There was a problem hiding this comment.
Thanks for the clarification and updating the custom catalog doc.
|
Thanks @ggershinsky for the PR! Thanks @palladium-coder @smaheshwar-pltr for the review! |
|
@ggershinsky This doesn't render correctly. Can you check and fix? It would be best to share a snapshot of the page in the PR from local doc build.
|
|
I fixed the rendering problem in #14756 |
* initial commit * clean up * brief how it works section * clean up * add refs * discussion updates * address review comments * add ref to custom catalogs doc * add line break
|
|
||
| Iceberg table encryption protects confidentiality and integrity of table data in an untrusted storage. The `data`, `delete`, `manifest` and `manifest list` files are encrypted and tamper-proofed before being sent to the storage backend. | ||
|
|
||
| The `metadata.json` file does not contain data or stats, and is therefore not encrypted. |
There was a problem hiding this comment.
I believe this is not true, please check this PR on stats contained in the metadata.json #14502 (comment), thoughts on exploit
cc @ggershinsky
There was a problem hiding this comment.
Thanks again @singhpk234, can you please clarify a few points
- Is the following true? - partition summary is not a part of the snapshot summary in the Iceberg spec https://iceberg.apache.org/spec/#optional-snapshot-summary-fields ; but in the implementation, it is added sometimes to the snapshots and can contain data stats.
- If yes, when the partition summaries are enabled, and when do they have stats? Is any of this under the writing user control?
- Can you give an example of such a usecase (that writes stats to the metadata.json)?
There was a problem hiding this comment.
Thank you for taking look @ggershinsky !
- Partition summary or even any summary props are supposed to be optional and not defined in spec, when you use iceberg java impl these are collected when
write.summary.partition-limitis enable (default off) please check this for details : https://iceberg.apache.org/docs/nightly/configuration/#write-properties - Yes they are enabled by the writer (user setting this table prop
write.summary.partition-limitengines such as spark etc collect these when enabled. - please check this https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L202
on a high level it can reveal which column values and stats on file counts etc
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L163
please let me know what do you think about it
There was a problem hiding this comment.
Are partition summary fields encapsulated in the SnapshotSummary.UpdateMetrics class?
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L223
Looks like a set of counters. Are there stats (col min/max) too?

No description provided.