-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
The problematic Avro and Thrift schemas are:
record AvroArrayOfArray {
array<array<int>> int_arrays_column;
}
and
struct ThriftListOfList {
1: list<list<i32>> intArraysColumn;
}
They are converted to the following structurally equivalent Parquet schemas by parquet-avro 1.7.0 and parquet-thrift 1.7.0 respectively:
message AvroArrayOfArray {
required group int_arrays_column (LIST) {
repeated group array (LIST) {
repeated int32 array;
}
}
}
and
message ParquetSchema {
required group intListsColumn (LIST) {
repeated group intListsColumn_tuple (LIST) {
repeated int32 intListsColumn_tuple_tuple;
}
}
}
AvroIndexedRecordConverter cannot decode such records correctly. The reason is that the 2nd level repeated group array doesn't pass AvroIndexedRecordConverter.isElementType() check. We should check for field name "array" and field name suffix "_thrift" in isElementType() to fix this issue.
Reporter: Cheng Lian / @liancheng
Assignee: Ryan Blue / @rdblue
Related issues:
- Implement nested type read rules in parquet-thrift (blocks)
- Parquet support fail to decode Avro/Thrift arrays of primitive array (e.g. array<array>) (relates to)
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as PARQUET-364. Please see the migration documentation for further details.