-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Milestone
Description
I am getting the following exception when reading a parquet file that was created using Avro WriteSupport and Parquet write version v2.0:
Caused by: parquet.io.ParquetDecodingException: Can't read value in column [colName, rows, array, name] BINARY at value 313601 out of 428260, 1 out of 39200 in currentPage. repetition level: 0, definition level: 2
at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
at parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:364)
at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at parquet.column.values.deltastrings.DeltaByteArrayReader.readBytes(DeltaByteArrayReader.java:70)
at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:307)
at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:458)
... 30 more
The file is quite big (500Mb) so I cannot upload it here, but possibly there is enough information in the exception message to understand the cause of error.
Reporter: Konstantin Shaposhnikov / @kostya-sh
Assignee: Konstantin Shaposhnikov / @kostya-sh
Related issues:
- Release Parquet 1.8.0 (blocks)
- DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving across pages (duplicates)
PRs and other links:
Note: This issue was originally created as PARQUET-246. Please see the migration documentation for further details.