Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compute function FFI #113

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

paleolimbot
Copy link
Contributor

@paleolimbot paleolimbot commented Jan 7, 2025

This should be able to export:

  • DataFusion scalar function
  • DataFusion table function
  • Arrow C++ Scalar function
  • Arrow C++ Aggregate function
  • Any file reader that can export an ArrowArray
  • A data source based on a SQL query + parameters from an AdbcConnection (with a lot of mutexes)

This should be able to be imported and used as a:

  • DataFusion scalar function
  • DataFusion table function
  • Arrow C++ Scalar function
  • DuckDB scalar function
  • DuckDB table function

...obviously that needs some testing, but the concepts are intended to line up:

  • "bind" / "return_type" / "KernelState::Resolver" == bind (computes output type from input type)
  • Various combinations of clone/push/pull can accomodate per-batch computation (i.e., DataFusion's invoke_batch, DuckDB's "function") and aggregation of multiple batches into a single value (Arrow's ScalarAggregate function, DataFusion's accumulator, one stage of DuckDB's aggregator)
  • Arrow C++-based and nanoarrow-based implementations should be able to respect the custom allocator; DuckDB and Arrow C++ should be able to provide the custom allocator (DataFusion doesn't have a non-default allocator that I could find documentation on)

This doesn't handle pre-allocated output (maybe DuckDB, Arrow C++). Most geo functions don't benefit from this anyway (very few fixed-width -> fixed width transformations that are cheap enough that they could reuse the input allocation).

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.00%. Comparing base (db02936) to head (e8113af).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
- Coverage   94.53%   93.00%   -1.53%     
==========================================
  Files          25       29       +4     
  Lines        4132     4418     +286     
==========================================
+ Hits         3906     4109     +203     
- Misses        226      309      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant