-
Notifications
You must be signed in to change notification settings - Fork 462
Closed
Labels
Description
BinaryStatistics currently only have a min/max, which are compared as signed byte[]. However, for real UTF8-friendly lexicographic comparison, e.g. for string columns, we would want to calculate the BinaryStatistics based off of a comparator that treats the bytes as unsigned.
Reporter: Andrew Duffy
Assignee: Ryan Blue / @rdblue
Related issues:
- Release Parquet format 2.4.0 (blocks)
- Release Parquet-mr 1.9.0 (blocks)
- Min-max should be computed based on logical type (is duplicated by)
- Statistics is not available for DECIMAL types (causes)
- Reference column_order field from column indexes (relates to)
- Parquet String Pushdown for Non-Eq Comparisons Broken (is related to)
PRs and other links:
Note: This issue was originally created as PARQUET-686. Please see the migration documentation for further details.