Skip to content

[C++][Parquet] MinMax statistics for strings may be inaccurate after a merge #47995

@rip-nsk

Description

@rip-nsk

Describe the bug, including details regarding any error messages, version, and platform.

TypedStatistics::Megre function disregards column chunks when the minimum value is an empty string (represented as {null, 0}) due to the following code:

optional<std::pair<ByteArray, ByteArray>> CleanStatistic(
    std::pair<ByteArray, ByteArray> min_max, LogicalType::Type::type) {
  if (min_max.first.ptr == nullptr || min_max.second.ptr == nullptr) {
    return ::std::nullopt;
  }
  return min_max;
}

so both minimum and maximum values will be lost.

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions