-
Notifications
You must be signed in to change notification settings - Fork 221
arrow<->arrow2 interopability conversion method? #629
Comments
Nop, we do not have such a methods. That is because doing so requires arrow depending on arrow2 or vice-versa. The arrow format was designed exactly to support this case, though, and the FFI is really the conversion. What happens is that a |
There could be a crate one level higher that implements the conversion.
It could maybe support conversion of the core types, |
off topic, but in case you missed it, there is also a fairly uptodate arrow2 branch for datafusion that's being worked on. |
Hi @houqp , may I ask what branch is that? and is that going to be under apache or also as part of a separate repo? |
I think it means apache/datafusion#68 |
This could easily be feature-gated though, so there is no hard dependency. The FFI works but is very awkward to do manually since it involves working with raw pointers, versus the lovely abstraction of just calling |
True, although I think there is some benefits to having it be a separate crate, mainly the releases wouldn't be tied to any either project's release cycle (i.e a new release of the conv crate could be made whenever arrow or arrow2 is released) It also could allow the crate to work on multiple versions of either version via feature flags (similar to how multiple winit versions are handled here) The crate could also support the pyarrow interop via Biggest drawback of it being a separate crate would be you can't as neatly implement some conversion traits (since they would be forgien types) |
I think a conversion crate would be valuable indeed, though I don't have time to work on such a thing now. We could potentially host it in the https://github.com/datafusion-contrib organization and there might be others in the community who are interested in helping -- see the |
designing a bit, I think that we would need 6 functions: pub fn arrow_to_arrow2_error(error: arrow::error::ArrowError) -> arrow2::error::ArrowError;
pub fn arrow_to_arrow2_field(field: arrow::datatypes::Field) -> Result<arrow2::datatypes::Field, arrow2::error::ArrowError>;
pub fn arrow_to_arrow2_array(array: Arc<dyn arrow::array::Array>) -> Result<Arc<dyn arrow2::array::Array>, arrow2::error::ArrowError>;
pub fn arrow2_to_arrow_error(error: arrow2::error::ArrowError) -> arrow::error::ArrowError;
pub fn arrow2_to_arrow_field(field: arrow2::datatypes::Field) -> Result<arrow::datatypes::Field, arrow::error::ArrowError>;
pub fn arrow2_to_arrow_array(array: Arc<dyn arrow2::array::Array>) -> Result<Arc<dyn arrow::array::Array>, arrow::error::ArrowError>; |
You miss out on a valid naming option: |
🤯 lol |
I'm using
arrow2
in a project along with polars. I now also want to send the same data to datafusion, which usesarrow
(ideally without having to send the data through via the IPC serialization format)Before I fumble around with the FFI API, I figured I would check first: is there a method somewhere which handles the conversion of a
RecordBatch
from arrow to arrow2 and vice versa? Seems like something that might exist already in some kind of integration test or similar, but "arrow<->arrow2" is quite a tricky thing to search forThe text was updated successfully, but these errors were encountered: