Skip to content

Conversation

@konstantinb
Copy link
Contributor

What changes were proposed in this pull request?

HIVE-29332: Use null values for min/max Range values of numeric columns if the corresponding stats values are not set

Why are the changes needed?

Stats could be severely underestimated for some queries. In addition, invalid ranges like [0, -10] were theoretically possible. The following screenshot clearly highlights changes in the EXPLAIN output:
HIVE-2932-explain-before-and-after

Does this PR introduce any user-facing change?

No

How was this patch tested?

with a query test file, with unittesting, and with a proprietary Hive implementation

keys: d_datekey (type: bigint), d_sellingseason (type: string)
null sort order: zz
Statistics: Num rows: 1 Data size: 96 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 2 Data size: 104 Basic stats: COMPLETE Column stats: COMPLETE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now accurately reflects 2 values in the "IN" clause whereas before these changes, "1" was used due to the interval of [0, 0] with the length of 1

Filter Operator
predicate: (d_year) IN (1985, 2004) (type: boolean)
Statistics: Num rows: 1 Data size: 96 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 2 Data size: 104 Basic stats: COMPLETE Column stats: COMPLETE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now accurately reflects 2 values in the "IN" clause whereas before these changes, "1" was used due to the interval of [0, 0] with the length of 1

@sonarqubecloud
Copy link

@konstantinb konstantinb marked this pull request as ready for review November 24, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants