-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37 #1785
Comments
Avro module author here. The code expects a non null value and instead got a struct. https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/avro_to_arrow/arrow_array_reader.rs#L771 I guess the code should be setting a null value instead. I haven't tested it on the arrow2 branch, which has much better performance for avro https://github.com/apache/arrow-datafusion/tree/arrow2 |
fwiw I investigated this yesterday (thanks a lot for sharing a stub of the file, @joshuarobinson!). jorgecarleitao/arrow2#826 enables reading this file (arrow2 did not support nested Record; now it does). |
Well, I guess I should patch the module on master until arrow2 becomes the default for avro. |
thanks for the quick responses. I've managed to work around the issue with avro-rs module for the time being and I'll look for new releases :) |
@joshuarobinson if you want, you can use the arrow2 branch, it's the one I use to read avro |
Describe the bug
When trying to use "read_avro()" or "register_avro()" with datafusion 6.0 (with feature=avro) and a certain avro file, I consistently get a panic. Other avro files with different schemas have been okay.
thread 'main' panicked at 'expected struct got None', /home/ir/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37
The avro file is correctly decoded when using avro-tools-1.11.0.jar.
To Reproduce
I have simplified the logic to the following test program: https://gist.github.com/joshuarobinson/413536d5affd751eb9d8958a970e8b04
and I'm attaching a link to the 6KB avro file that causes me the problem.
Expected behavior
Print out the contents of the avro file with basic datafusion "df.collect.show" type logic.
Additional context
The problematic avro file is part of Apache Iceberg metadata, fwiw.
Stacktrace:
The contents of the avro look like:
The text was updated successfully, but these errors were encountered: