-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust is not interoperability with C++ for IPC schemas with dictionaries #1694
Comments
Hi @pierrebelzile , thanks for reporting the issue. I recently fixed some issues (e.g. #1636) on integrating with C++ for dictionaries. These fixes will be released soon in 14.0.0 (#1693). Can you re-check with 14.0.0 once it is released? Thank you. |
Great. I will try as soon as it comes out. Any chance that you can test with your code if you are able to read the cpp_out.bin file in this repo? |
Yea, I just downloaded cpp_out.bin file and ran a test with it. I can read the file by using IPC let reader = StreamReader::try_new(file, None).unwrap();
reader.for_each(|batch| {
let batch = batch.unwrap();
println!("batch: {:?}", batch);
});
|
I think we can close this issue as it proves the file can be read now. |
Closing given that almost certainly fixed. Will re-open if necessary. |
Thank you @pierrebelzile. Yea, you can open this again if you find any issue on integrating this crate with C++. |
I'm reopening because the problem is still there in the direction rust -> c++: An arrow IPC data with dictionaries written in Rust is not readable from Arrow C++ (version 8). I was able to confirm that the other direction (c++ -> rust) is working as expected with dictionaries. |
@pierrebelzile Can you let me know how I can test the case (Rust -> C++)? Do you have reproducible example? |
I updated the github repo to include instructions to build but I can't connect to it from my work to correctly finish setting up. In the meanwhile, I've single stepped the code in C++ and the issue is that the dictionary id seen in C++ is always 0 in cpp/src/arrow/ipc/reader.cc:786 (arrow 7.0). This causes the dictionaries to "overwrite" each other and that's why we only see one of them. I have no idea why it works in Rust... |
Oh, that is because in your fields.push(Field::new("as", array.data_type().clone(), true));
...
fields.push(Field::new("bs", array.data_type().clone(), true));
So you duplicate dictionary ids. I changed the code to the follows, and ran the C++ program without any issue: fields.push(Field::new_dict("as", array.data_type().clone(), true, 0, false));
...
fields.push(Field::new_dict("bs", array.data_type().clone(), true, 1, false)); |
Thanks for finding that. I'm not used to this brecause in C++ the id is automatically assigned. What will happen when (if not yet) the data type is a map<string dict key, string value>? How does one set the dict id then? |
The |
Closing because issue was mine. |
In my application, I created in Rust RecordBatch's (arrow 12) that contain multiple dictionary fields. I serialized using streaming IPC and attempted to read with C++ (4.0 and 8.0). The server failed to recover the batches. It seems to get confused with associating the proper dictionary to the correct field. Seems to only see the last dictionary?
I should also point out that I attempted the opposite: Reading in Rust what was created in C++ and that also failed.
I reduced this to a batch with two dictionary fields and uploaded that to: https://github.com/pierrebelzile/arrow_ipc_test. This includes files with the binaries.
I'm hoping there is an expert IPC-person here that quickly knows what is wrong...
The text was updated successfully, but these errors were encountered: