You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ComplexObjectArrayReader does not use RecordReader and consequently does not correctly delimit semantic records when reading, in particular it may yield values that truncate a row part way through. This will in turn cause the parent ListArrayReader to error out as the repetition levels will not be consistent
To Reproduce
fn test_decimal_list() {
let decimals = Decimal128Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8]);
// [[], [1], [2, 3], null, [4], null, [6, 7, 8]]
let data = ArrayDataBuilder::new(ArrowDataType::List(Box::new(Field::new(
"item",
decimals.data_type().clone(),
false,
))))
.len(7)
.add_buffer(Buffer::from_iter([0_i32, 0, 1, 3, 3, 4, 5, 8]))
.null_bit_buffer(Some(Buffer::from(&[0b01010111])))
.child_data(vec![decimals.into_data()])
.build()
.unwrap();
let written = RecordBatch::try_from_iter([(
"list",
Arc::new(ListArray::from(data)) as ArrayRef,
)])
.unwrap();
let mut buffer = Vec::with_capacity(1024);
let mut writer =
ArrowWriter::try_new(&mut buffer, written.schema(), None).unwrap();
writer.write(&written).unwrap();
writer.close().unwrap();
let read = ParquetFileArrowReader::try_new(Bytes::from(buffer))
.unwrap()
.get_record_reader(3)
.unwrap()
.collect::<ArrowResult<Vec<_>>>()
.unwrap();
assert_eq!(&written.slice(0, 3), &read[0]);
assert_eq!(&written.slice(3, 3), &read[1]);
assert_eq!(&written.slice(6, 1), &read[2]);
}
Results in
ParquetError("Parquet error: first repetition level of batch must be 0")
Expected behavior
We should support reading these nested types.
Additional context
#1661 tracks removing this ArrayReader as it is buggy, complex, and not really needed anymore
The text was updated successfully, but these errors were encountered:
alamb
changed the title
ComplexObjectArrayReader Handles Repetition Levels Incorrectly
Reading Structs / Lists may be truncated sometimes: ComplexObjectArrayReader Handles Repetition Levels Incorrectly
Aug 1, 2022
Describe the bug
ComplexObjectArrayReader
does not useRecordReader
and consequently does not correctly delimit semantic records when reading, in particular it may yield values that truncate a row part way through. This will in turn cause the parentListArrayReader
to error out as the repetition levels will not be consistentTo Reproduce
Results in
Expected behavior
We should support reading these nested types.
Additional context
#1661 tracks removing this ArrayReader as it is buggy, complex, and not really needed anymore
The text was updated successfully, but these errors were encountered: