Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add foreign table providers #921

Merged
merged 28 commits into from
Nov 15, 2024
Merged

Add foreign table providers #921

merged 28 commits into from
Nov 15, 2024

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Oct 14, 2024

Which issue does this PR close?

Closes #823
Closes #941

Rationale for this change

This feature enables DataFusion python to interoperate with foreign table providers that implement TableProvider and expose it via PyCapsule

What changes are included in this PR?

Adds a feature to register a table provider in the session context.

Are there any user-facing changes?

Addition only, no existing APIs modified.

pub fn register_table_provider(
&mut self,
name: &str,
provider: Bound<'_, PyAny>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, taking delta-rs as an example.

The provider would be an instance of a deltatable object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what you would need to implement in delta-rs as a python method on RawDeltaTable

    fn __datafusion_table_provider__<'py>(&self, py: Python<'py>) -> PyResult<Bound<'py, PyCapsule>> {
        let name = CString::new("datafusion_table_provider").unwrap();
        let provider = FFI_TableProvider::new(Arc::new(self._table.clone()));
        PyCapsule::new_bound(py, provider, Some(name.clone()))
    }

Does that answer the question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!

provider: Bound<'_, PyAny>,
py: Python,
) -> PyResult<()> {
if provider.hasattr("__datafusion_table_provider__")? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise when it doesn't have this attribute, it might give the wrong impression for users that they registered it but nothing happened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Added.

@timsaucer
Copy link
Contributor Author

Now that 43.0.0 has merged into main, I've rebased and I'll work next on adding example and unit tests.

@timsaucer timsaucer marked this pull request as ready for review November 11, 2024 23:35
Copy link
Contributor

@Michael-J-Ward Michael-J-Ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Q is whether the TableProvider PyCapsule needs validation similar to what we do for dataframe capsules.

Otherwise, looks great!

if provider.hasattr("__datafusion_table_provider__")? {
let capsule = provider.getattr("__datafusion_table_provider__")?.call0()?;
let capsule = capsule.downcast::<PyCapsule>()?;
// validate_pycapsule(capsule, "arrow_array_stream")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be commented out?

@@ -685,6 +685,14 @@ def deregister_table(self, name: str) -> None:
"""Remove a table from the session."""
self.ctx.deregister_table(name)

def register_table_provider(self, name: str, provider: Any) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a protocol typehint perhaps, so that you get warnings when that attribute is missing

@timsaucer
Copy link
Contributor Author

Thank you for the reviews! I've added an issue to track those two small requests, but I'd like to get this in now

@timsaucer timsaucer merged commit 5e32ada into apache:main Nov 15, 2024
23 checks passed
@timsaucer timsaucer deleted the feature/ffi branch November 15, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for a custom table provider Expose API to register a foreign TableProvider
3 participants