Skip to content

Pyiceberg allows dropping the sort order column and causes table corruption on AWS Glue Catalog  #2166

@mwa28

Description

@mwa28

Apache Iceberg version

0.9.0

Please describe the bug 🐞

Hello,

I am currently using v0.9.0rc3 and trying to evolve the schema for an existing non-empty table living the aws glue catalog.

It seems that

table.update_schema().add_column("col_name", StringType()).commit()

will successfully add the column to the glue catalog table however, this will cause the table to become corrupt and no longer useable.

Trying to query it from Athena gives the following :

[ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience.

I raise the issue with AWS support and tried to do a
https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-alter-table-add-columns.html
on another table and it works fine.

It seems that something from pyiceberg is currently causing the table to become corrupt maybe ?

For reference

table.update_schema().delete_column("col_name").commit()

causes no harm and the table is updated automatically in the catalog and querying it works fine as well.

I will maybe try to provide a follow-up once AWS support shares their feedback on the matter.

Please feel free to close this issue if it can be confirmed that the issue is on AWS's side solely.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions