Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust <> Python integration point #538

Open
kevinjqliu opened this issue Aug 11, 2024 · 6 comments
Open

Rust <> Python integration point #538

kevinjqliu opened this issue Aug 11, 2024 · 6 comments
Assignees

Comments

@kevinjqliu
Copy link

After establishing #518, I want to start the conversation to create the first integration between PyIceberg and iceberg-rust.
As discussed in the dev list, we want to create an integration based on pluggable FileIO.

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I'm not familiar with the rust ecosystem, so would appreciate any pointers

@Xuanwo
Copy link
Member

Xuanwo commented Aug 14, 2024

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I believe this should also be possible. So, the pyiceberg community wants to have an in-memory catalog based on iceberg-rust. Does pyiceberg provide an interface that we can integrate with?

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

@liurenjie1024
Copy link
Contributor

I think it's definitely possible since PyIceberg is Catalog interface is extensible. I think you need to start with pyo3 first to understand how it works.

@kevinjqliu
Copy link
Author

Does pyiceberg provide an interface that we can integrate with?

Yes, there is a py-catalog-impl configuration that will try to load a given classpath. (documentation, implementation, test)

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

I'm bringing up this issue because I want the simplest way to integrate iceberg-python and iceberg-rust. If FileIO integration is a prerequisite, we can start there instead.

@Xuanwo
Copy link
Member

Xuanwo commented Aug 16, 2024

Hi, @kevinjqliu, I'm sorry for blocking your innovation this way.

I've been a bit busy recently, but I plan to create something that really works next week. For instance, reading data from PyIceberg using pyiceberg-core. This will enable our community to build more cool things based on that.

@kevinjqliu
Copy link
Author

@Xuanwo very cool! looking forward to it.

@kevinjqliu
Copy link
Author

Looks like @sungwy already started by exposing Transforms in #556

I'll take a stab at exposing the Catalogs, see #534 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants