Glue scan with filter throws list index out of range #1804

Cabeda · 2025-03-18T10:40:14Z

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

Hi,

Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.

I've created a table like follows using pyiceberg

            schema = Schema(
                NestedField(field_id=1, name="bk_id", field_type=StringType(), required=False),
                NestedField(field_id=2, name="inference_date", field_type=TimestampType(), required=False),
                NestedField(field_id=3, name="verified", field_type=BooleanType(), required=False),
                NestedField(field_id=4, name="id", field_type=StringType(), required=True),
            )

I've been able to do multiple appends to the table using pyiceberg with no issues.

Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with dummy = table.scan(row_filter=EqualTo("id", 'dummy_id')) I get list index out of range. This seems to be because pyiceberg isn't able to retrieve the row.

Here is the code I have setup to replicate the issue:

from pyiceberg.expressions import EqualTo
import pyarrow as pa

df = pa.Table.from_pydict(
        {
            "bk_id": ["BK123456"],
            "inference_date": [pd.Timestamp.now()],
            "verified": [False],
            "id": ["dummy_id"],
        }
    )


catalog = load_catalog(
        "glue",
        **{
            "type": "glue",
            "warehouse": warehouse_path,
            "downcast-ns-timestamp-to-us-on-write": True,
        },
    )

table_identifier = "database_name.table_name"
table = catalog.load_table(table_identifier)


table.append(df)


dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))
dummy.to_arrow()

Is there something I'm doing wrong?

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

Cabeda · 2025-03-25T22:24:12Z

Seems like the issue is due to apache/arrow#44366

koenvo mentioned this issue Apr 9, 2025

Temporary fix for Arrow issue 46057 #1901

Merged

Fokko closed this as completed in #1901 Apr 9, 2025

Fokko closed this as completed in 1588701 Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glue scan with filter throws list index out of range #1804

Glue scan with filter throws list index out of range #1804

Cabeda commented Mar 18, 2025 •

edited

Loading

Cabeda commented Mar 25, 2025

Glue scan with filter throws list index out of range #1804

Glue scan with filter throws list index out of range #1804

Comments

Cabeda commented Mar 18, 2025 • edited Loading

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Cabeda commented Mar 25, 2025

Cabeda commented Mar 18, 2025 •

edited

Loading