-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for a custom table provider #941
Labels
enhancement
New feature or request
Comments
We will have this supported in |
A stable FFI for table providers! I had no idea. Very cool. |
timsaucer
added a commit
to timsaucer/datafusion-python
that referenced
this issue
Jan 28, 2025
…ython. Also removed the rust unit tests copied over from upstream repo that were failing due to apache#941 in pyo3
timsaucer
added a commit
to timsaucer/datafusion-python
that referenced
this issue
Feb 1, 2025
…ython. Also removed the rust unit tests copied over from upstream repo that were failing due to apache#941 in pyo3
timsaucer
added a commit
that referenced
this issue
Feb 1, 2025
* Add developer instructions to speed up build processes * Remove pyarrow dep from datafusion. Add in PyScalarValue wrapper and rename DataFusionError to PyDataFusionError to be less confusing * Removed unnecessary cloning of scalar value when going from rust to python. Also removed the rust unit tests copied over from upstream repo that were failing due to #941 in pyo3 * Change return types to PyDataFusionError to simplify code * Update exception handling to fix build errors in recent rust toolchains
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is similar to #920 but maybe more specific. Lance (https://github.com/lancedb/lance) has a custom table provider and I was interested in using datafusion-python with this table provider. However, I'm not sure there is an easy solution.
I was hoping, in Lance's python bindings, I could just do something like...
Then use this in python as:
Unfortunately, this leads to:
I suspect the problem is that the
SessionContext
linked into lance's python module is different from theSessionContext
linked into datafusion_python's python module.Here's a few thoughts off the top of my head. Maybe there is something easier I am missing however.
A simple, but not ideal, solution is to just add lance as a dependency to datafusion-python. I'm assuming that the datafusion-python project doesn't want 3rd party dependencies however.
The "dataset protocol" never got quite finished but we can kind of use pyarrow datasets as the dataset protocol. This is actually what I've ended up using for the time being. I use register_dataset and
LanceDataset
already duck types as a pyarrow dataset so this works but it's not as flexible.I'm not entirely sure this is possible but it seems the datafusion-federation project may have a way of handling abstract table providers over Substrait. datafusion-python could add datafusion-federation as a dependency to allow a
register_federated
method.The text was updated successfully, but these errors were encountered: