DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon after it has processed a new page via initFromPage(). This issue can be reproduced by trying to read a Binary column that is encoded using delta byte array and spans multiple pages.
This is happening because ColumnReaderImpl.initDataReader() creates a new ValueReader every time a new page is processed (see this.dataColumn = dataEncoding.getValuesReader(path, VALUES)). The DeltaByteArrayReader is stateful and needs to remember the previous Binary value that was read across pages. When a new DeltaByteArrayReader is created, this information is lost.
Reporter: Alosh Bennett
Related issues:
Note: This issue was originally created as PARQUET-244. Please see the migration documentation for further details.