Skip to content

ParquetInputSplit end calculation bug #1750

@asfimport

Description

@asfimport

The calculation for end of a split using the file metadata is broken by PARQUET-108. The calculation was updated to use the requested schema so that the end of a block would be the end of the last projected column. But the end logic actually calculates the total number of bytes that are selected.

The end of a split is only used to select row groups when a block has no row group offsets, which doesn't happen when the constructor that uses the broken method is called. However, this should still be removed.

After 1.6.0, I want to move Hive to pass FileSplits directly rather than wrapping them in ParquetInputSplit. The internal reader code can handle mapping row groups to splits because it needs to for PARQUET-84.

Reporter: Ryan Blue / @rdblue

Note: This issue was originally created as PARQUET-207. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions