Skip to content

Conversation

@kevinjqliu
Copy link
Contributor

Which issue does this PR close?

What changes are included in this PR?

This PR creates a new IcebergDataFusionTable python class and exposes it through the new pyiceberg_core.datafusion module.

from pyiceberg_core.datafusion import IcebergDataFusionTable

The goal of exposing IcebergDataFusionTable is to be able to register the Iceberg table provider to datafusion-python, using the register_table_provider API.
See the usage example in bindings/python/tests/test_datafusion_table_provider.py

The integration relies on the FFI_TableProvider API as described in https://datafusion.apache.org/python/user-guide/io/table_provider.html

Note that this integration only works for datafusion >= 45 due to this issue apache/datafusion#13851

Are these changes tested?

Yes, unit tests.

To build and test locally:

cd bindings/python
hatch run dev:develop
hatch run dev:test

@kevinjqliu
Copy link
Contributor Author

cc @timsaucer for helping me with the integration, finally got around to this

import datafusion

assert (
datafusion.__version__ >= "45"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make sure that rust and python are using the same version of datafusion?

cc @alamb for ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not need to strictly be the same version, but there have been a couple of small ffi api changes that were breaking. I need to track down exactly which versions were compatible. I will try to get to this soon.

Once the ffi api is stable you can use different versions of datafusion rust and python. I have demonstrated this in other projects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really great! Thank you @timsaucer for the help.

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/datafusion-iceberg-table-provider branch from eb0fe40 to cf5bd54 Compare May 14, 2025 16:24
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, cc @liurenjie1024 for another look.

@kevinjqliu kevinjqliu added this to the 0.5.0 Release milestone May 14, 2025
Copy link
Contributor

@sdd sdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks @kevinjqliu 🙌🏼

@Xuanwo Xuanwo merged commit d9f2fe5 into apache:main May 15, 2025
18 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/datafusion-iceberg-table-provider branch May 15, 2025 04:40
@kevinjqliu
Copy link
Contributor Author

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pyiceberg_core] Expose IcebergTableProvider to python

4 participants