Skip to content

Error reading some files after PARQUET-77 bytebuffer read path #1910

@asfimport

Description

@asfimport

This issue is based on a discussion on the list started by @danielcweeks

Full discussion:
https://mail-archives.apache.org/mod_mbox/parquet-dev/201512.mbox/%3CCAMpYv7C_szTheua9N95bXvbd2ROmV63BFiJTK-K-aDNK6ZNBKA%40mail.gmail.com%3E

From the thread (he later provided a small repro file that is attached here):

Just wanted to see if you or anyone else has run into problems reading
files after the ByteBuffer patch. I've been running into issues and have
narrowed it down to the ByteBuffer commit using a small repro file (written
with 1.6.0, unfortunately can't share the data).

It doesn't happen for every file, but those that fail give this error:

can not read class org.apache.parquet.format.PageHeader: Required field
'uncompressed_page_size' was not found in serialized data! Struct:
PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)

Reporter: Jason Altekruse / @jaltekruse
Assignee: Jason Altekruse / @jaltekruse

Related issues:

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as PARQUET-400. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions