-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add foreign table providers #921
Conversation
fcf9475
to
f5f983c
Compare
pub fn register_table_provider( | ||
&mut self, | ||
name: &str, | ||
provider: Bound<'_, PyAny>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, taking delta-rs as an example.
The provider would be an instance of a deltatable object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what you would need to implement in delta-rs
as a python method on RawDeltaTable
fn __datafusion_table_provider__<'py>(&self, py: Python<'py>) -> PyResult<Bound<'py, PyCapsule>> {
let name = CString::new("datafusion_table_provider").unwrap();
let provider = FFI_TableProvider::new(Arc::new(self._table.clone()));
PyCapsule::new_bound(py, provider, Some(name.clone()))
}
Does that answer the question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely!
provider: Bound<'_, PyAny>, | ||
py: Python, | ||
) -> PyResult<()> { | ||
if provider.hasattr("__datafusion_table_provider__")? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we raise when it doesn't have this attribute, it might give the wrong impression for users that they registered it but nothing happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Added.
f5f983c
to
7240d44
Compare
7240d44
to
01a15f1
Compare
Now that 43.0.0 has merged into main, I've rebased and I'll work next on adding example and unit tests. |
…un during the first pass of pytest when the module hasn't been built
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Q is whether the TableProvider PyCapsule
needs validation similar to what we do for dataframe
capsules.
Otherwise, looks great!
if provider.hasattr("__datafusion_table_provider__")? { | ||
let capsule = provider.getattr("__datafusion_table_provider__")?.call0()?; | ||
let capsule = capsule.downcast::<PyCapsule>()?; | ||
// validate_pycapsule(capsule, "arrow_array_stream")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be commented out?
@@ -685,6 +685,14 @@ def deregister_table(self, name: str) -> None: | |||
"""Remove a table from the session.""" | |||
self.ctx.deregister_table(name) | |||
|
|||
def register_table_provider(self, name: str, provider: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use a protocol typehint perhaps, so that you get warnings when that attribute is missing
Thank you for the reviews! I've added an issue to track those two small requests, but I'd like to get this in now |
Which issue does this PR close?
Closes #823
Closes #941
Rationale for this change
This feature enables DataFusion python to interoperate with foreign table providers that implement TableProvider and expose it via PyCapsule
What changes are included in this PR?
Adds a feature to register a table provider in the session context.
Are there any user-facing changes?
Addition only, no existing APIs modified.