Skip to content

ArrayIndexOutOfBoundsException with Parquet write version v2 #1782

@asfimport

Description

@asfimport

I am getting the following exception when reading a parquet file that was created using Avro WriteSupport and Parquet write version v2.0:

Caused by: parquet.io.ParquetDecodingException: Can't read value in column [colName, rows, array, name] BINARY at value 313601 out of 428260, 1 out of 39200 in currentPage. repetition level: 0, definition level: 2
	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
	at parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:364)
	at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
	... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at parquet.column.values.deltastrings.DeltaByteArrayReader.readBytes(DeltaByteArrayReader.java:70)
	at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:307)
	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:458)
	... 30 more

The file is quite big (500Mb) so I cannot upload it here, but possibly there is enough information in the exception message to understand the cause of error.

Reporter: Konstantin Shaposhnikov / @kostya-sh
Assignee: Konstantin Shaposhnikov / @kostya-sh

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-246. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions