Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example: FFI Table Provider as dynamic module loading #13183

Merged
merged 10 commits into from
Nov 5, 2024

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

Closes #13175

Rationale for this change

PR #12920 introduces FFI bindings for TableProvider but did not show an end to end example of using these. This PR improves the documentation for how to use the datafusion-ffi crate and a complete example of building one library that exposes a table provider and loading it from a different binary.

What changes are included in this PR?

Adds three crates to show interface, implementation, and using a library table provider.

Are these changes tested?

These are an example, not any change to core functionality.

Are there any user-facing changes?

None.

@timsaucer
Copy link
Contributor Author

I'll rebase as soon as #12920 merges into main so the diff is small.

Copy link
Contributor

@ahirner ahirner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon for the too early review. I got curious and thought some comments would be more helpful than distracting. It's great!

datafusion/ffi/README.md Outdated Show resolved Hide resolved
datafusion/ffi/README.md Outdated Show resolved Hide resolved
# `datafusion-ffi`: Apache DataFusion Foreign Function Interface

This crate contains code to allow interoperability of Apache [DataFusion]
with functions from other languages using a stable interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we already define stable more precisely? Given frequent major releases of datafusion, a user may assume they can't load an FFI implementation compiled with 42 into a binary of version 43.

OTOH, are there maybe subtle limitations regarding tokio runtime or something alike? (didn't research this topic TBH)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run different versions! I've tested this in datafusion-python by backporting these changes into DF41 and I ran datafusion-python with DF42 and delta-rs with DF41. I'll work on the text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small explanation on where and how FFI table scans are scheduled in terms of rt threads would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you're looking for here or who the audience would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone loading plugins, I want to know if plugins would generally block tokio threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some text, but it's very brief. Basically, async calls through this interface operate exactly like their pure rust counterparts provided the library on the other side implements their code properly.

@timsaucer timsaucer marked this pull request as ready for review October 31, 2024 12:38
@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

❤️

@timsaucer
Copy link
Contributor Author

Thank you @ahirner for the review. I think this is ready for a full review now. If you have any outstanding questions or I didn't write it up to your satisfaction, please let me know!

@timsaucer timsaucer mentioned this pull request Nov 4, 2024
4 tasks
@timsaucer
Copy link
Contributor Author

@ahirner would you mind re-reviewing?

Copy link
Contributor

@ahirner ahirner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All points adressed. I also built and run the example on darwin x86.
Excellent quality IMO, which will help adoptiong this solution.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timsaucer

I also tried out the instructions in the README and it seemed to work great

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/datafusion-examples/examples/ffi/ffi_module_loader$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s
     Running `/Users/andrewlamb/Software/datafusion/target/debug/ffi_module_loader`
+----+------+
| a  | b    |
+----+------+
| 1  | 1.0  |
| 2  | 2.0  |
| 3  | 3.0  |
| 4  | 4.0  |
| 5  | 5.0  |
| 6  | 6.0  |
| 7  | 7.0  |
| 8  | 8.0  |
| 9  | 9.0  |
| 10 | 10.0 |
| 11 | 11.0 |
+----+------+

functions from other libraries and/or [DataFusion] versions using a stable
interface.

One of the limitations of the Rust programming language is that there is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


#[repr(C)]
#[derive(StableAbi)]
#[sabi(kind(Prefix(prefix_ref = TableProviderModuleRef)))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the need for this interface module.

Couldn't the caller program simple use construct_simple_table_provider directly? (It would have to know the name of the entry point of course which is less general than what you have here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need it, but it does give a separation of concerns where you have one crate for the interface so that any module implementers don't need to executable as a dependency. I'm mostly following the same approach as the examples in the abi_stable crate but it's also a model I've used in other places.

…rgo.toml

Co-authored-by: Alexander Hirner <6055037+ahirner@users.noreply.github.com>
@timsaucer timsaucer merged commit a6586cc into apache:main Nov 5, 2024
26 checks passed
@alamb alamb mentioned this pull request Nov 5, 2024
3 tasks
@timsaucer timsaucer deleted the feat/ffi-example branch November 6, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand FFI documentation
3 participants