Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore providing a ColumnarBatch like trait #21

Closed
rtyler opened this issue Jul 26, 2023 · 4 comments
Closed

Explore providing a ColumnarBatch like trait #21

rtyler opened this issue Jul 26, 2023 · 4 comments
Assignees

Comments

@rtyler
Copy link
Member

rtyler commented Jul 26, 2023

This issue is to capture the work of determining how feasible it would be to remove our arrow::RecordBatch dependency and replace it with something akin to the ColumnarBatch interface implemented in the JVM kernel implementation

The goal for this ticket is to provide an interface which would allow default-client to rely on an arrow::RecordBatch implementation of ColumnarBatch that makes sense in some form.

@rtyler rtyler self-assigned this Jul 26, 2023
@rtyler
Copy link
Member Author

rtyler commented Jul 26, 2023

Adding some of my thoughts here for consideration while reading the Java code:

  • That code implements a number of data types like StructType which are all largely wrappers around native JVM-based types. This means that there's not going to be any intrinsic performance improvements IMHO for these types when compared to something like Apache Arrow.
  • In our case these sort of type definitions would be largely redundant with data types that exist in arrow-rs.
  • The syntax of ColumnVector or Row are somewhat similar to serde_json::Value which has a number of as_<datatype> style methods that allow one to take the value out of Value and get the "primitive" data type back out. From Value we have precedent in the Rust ecosystem for this style of type masking.
  • I believe that in order to avoid needing to type cast into Rust types from arrow-rs, each of these "data types" would need to have an arrow implementation which would implement our StructType trait (as an example) and then wrap the arrow StructType. The Deref trait can be useful here for making the ergonomics of the wrapper simpler. It would be some more tedium in our maintenance here.

When I think about what this would look like in the context of an FFI bridge or something along those lines, it gets a little fuzzier for me to think of how these traits might be implemented.

I'm exploring some minimal amount of code to explore this thinking without needing to commit to too much time tinkering here.

@wjones127
Copy link
Collaborator

On the FFI bridge, I think you resort to exposing a C struct, that contains a void pointer to their data plus a bunch of function pointers implementing the trait methods. Then you can implement the trait for the C struct by calling into the function pointers.

Here's a vague sketch:

pub trait ColumnarBatch {
    fn method1(arg: ArgType) -> ReturnType;
}

#[repr(C)]
pub struct FFIColumnarBatch {
    private_data: *mut c_void,
    method1: ::std::option::Option<
        unsafe extern "C" fn(
            arg1: *mut FFIArgType,
        ) -> FFIReturnType,
    >,
}

impl ColumnarBatch for FFIColumnarBatch {
    fn method1(&self, arg: ArgType) -> ReturnType {
        let ffi_arg: FFIArgType = arg.into();
        let result: FFIReturnType = self.method1.unwrap()(&mut ffi_arg);
        result.into()
    }
}

This is something I've prototyped with Arrow ADBC: https://github.com/apache/arrow-adbc/blob/75ba2cef9080b19f4d242d0a03c17db304609d85/rust/src/ffi.rs#L462-L486

@wjones127
Copy link
Collaborator

Something to explore: There really only needs to be one ColumarBatch and associated types per connector. Could we define a header file and tell the user to provide a library that implements that so we can link to it? Then we could do static dispatch.

@nicklan
Copy link
Collaborator

nicklan commented Mar 26, 2024

closing as we've settled on #109

@nicklan nicklan closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants