-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Add support for Iceberg #10375
Conversation
f2d5242
to
a1b2e78
Compare
a53eef2
to
92f10e2
Compare
92f10e2
to
5107b6e
Compare
Hi @Fokko, looks like you've done quite some work here! Could you please make sure the CI passes? Then we'll take a closer look. Let us know if you need any help. I'll put this on draft for now. |
@stinodego Thanks!
Yes, I'm on top of it. PyIceberg is still at Pydantic <2. I have a PR ready on the PyIceberg side to bump it to 2, then we need to do a release, and then it should pass the CI. |
2f06a24
to
6a82d0d
Compare
return pl.LazyFrame._scan_python_function(arrow_schema, func, pyarrow=True) | ||
|
||
|
||
def _scan_pyarrow_dataset_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a second _scan_pyarrow_dataset_impl
? Can we not reuse the existing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko Could you come back to me on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Here we do something different than in the original one. In the original, the Python eval
function is used to convert the string containing Python to an actual Python class, and that's being passed into the delta library:
if predicate:
# imports are used by inline python evaluated by `eval`
from polars.datatypes import Date, Datetime, Duration # noqa: F401
from polars.utils.convert import (
_to_python_datetime, # noqa: F401
_to_python_time, # noqa: F401
_to_python_timedelta, # noqa: F401
)
_filter = eval(predicate)
What we do here is that we take the string, we convert it into an abstract syntax tree, and that's being traversed to convert it into a PyIceberg expression. The reason why I did this is that the PyArrow expression doesn't have any Python methods to traverse the expression (the same goes for the Polars expression, otherwise I could just traverse that one as well). I've added this to the docstring as well 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. I'm on holiday next week. Feel free to merge once pyiceberg 0.5.0 has been released and the PR has been updated accordingly.
@stinodego @alexander-beedie PyIceberg 0.5.0 has been released, and the PR is good to go :) |
Alright... here we go ;) |
Awesome, thanks all! 🚀 |
It's a great addition; looking forward to seeing it in action 👍 |
I did a small demo at PyData Amsterdam last week. Once it is released, I'll work on a blogpost! 👍🏻 |
@Fokko Do you have the demo at PyData Amsterdam recorded ? |
@luancaarvalho The demo is recorded, but not yet online. I think they still have to do some final editing. |
Resolves #6227
Using PyIceberg: