Skip to content

Support for Parquet Modular Encryption for PyArrow FileIO.#3

Open
yashad-margaj wants to merge 2 commits intoprotegrity:mainfrom
yashad-margaj:main
Open

Support for Parquet Modular Encryption for PyArrow FileIO.#3
yashad-margaj wants to merge 2 commits intoprotegrity:mainfrom
yashad-margaj:main

Conversation

@yashad-margaj
Copy link

@yashad-margaj yashad-margaj commented Jul 18, 2025

Rationale for this change

The current version of iceberg-python doesn't have Parquet Modular Encryption (PME) support. As a result, the parquet files are written in clear. This change has addressed this limitation by adding PME support for PyArrow FileIO first.

Are these changes tested?

Yes. These changes are tested locally. PME is working as expected.

Are there any user-facing changes?

Yes.

While creating a PyIceberg catalog, user needs to add following property:
    client.kms-vendor
Valid values are "aws", "azure", or "gcp".
Example:
    catalog = load_catalog(name="default", **{"client.kms-vendor": "aws", "type": "sql", "uri": "uri", "warehouse": "warehouse"})
If client.kms-vendor is "aws", then user needs to add following properties too:
    client.access-key-id
    client.region
    client.secret-access-key
If client.kms-vendor is "azure", then user needs to add following properties too:
    client.client-id
    client.client-secret
    client.tenant-id
If client.kms-vendor is "gcp", then user needs to add following properties too:
    client.oauth2-token
Complete AWS example:
    catalog = load_catalog(name="default", **{"client.access-key-id": "client.access-key-id", "client.kms-vendor": "aws", "client.region": "client.region", "client.secret-access-key": "client.secret-access-key", "type": "sql", "uri": "uri", "warehouse": "warehouse"})

Similarly,
While creating a PyIceberg table, user needs to add following properties:
    table.column-key
    table.footer-key
    table.keep-footer-in-plaintext
Valid values for table.keep-footer-in-plaintext are "yes" or "no".
Example:
    table = catalog.create_table(identifier="default.table", properties={"table.column-key": {"table.column-key": ["column_name"]}, "table.footer-key": "table.footer-key", "table.keep-footer-in-plaintext": "no"})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant