Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,16 @@ object AggregateEstimation {
// Multiply distinct counts of group-by columns. This is an upper bound, which assumes
// the data contains all combinations of distinct values of group-by columns.
var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))(
(res, expr) => res *
childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount.get)
(res, expr) => {
val columnStat = childStats.attributeStats(expr.asInstanceOf[Attribute])
val distinctCount = columnStat.distinctCount.get
val distinctValue: BigInt = if (distinctCount == 0 && columnStat.nullCount.get > 0) {
1
} else {
distinctCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the nullCount is not empty, the value should be distinctCount + 1, right?

@pengbo @dongjoon-hyun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
I will try and test it out. Another PR will be submitted if the problem exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. @gatorsmile . Originally, this PR aims to fix the case of column with only null value, but that's another case which we should fix. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I inferred that the distinct count already correctly counted a null as a distinct value. If not, yeah, then distinctCount doesn't matter; it should add 1 iff nullCount is > 0. Agree, if this is broader, can we get an additional test of that case too?

}
res * distinctValue
})

outputRows = if (agg.groupingExpressions.isEmpty) {
// If there's no group-by columns, the output is a single row containing values of aggregate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest {
attr("key22") -> ColumnStat(distinctCount = Some(2), min = Some(10), max = Some(20),
nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
attr("key31") -> ColumnStat(distinctCount = Some(0), min = None, max = None,
nullCount = Some(0), avgLen = Some(4), maxLen = Some(4))
nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
attr("key32") -> ColumnStat(distinctCount = Some(0), min = None, max = None,
nullCount = Some(4), avgLen = Some(4), maxLen = Some(4))
))

private val nameToAttr: Map[String, Attribute] = columnInfo.map(kv => kv._1.name -> kv._1)
Expand Down Expand Up @@ -116,6 +118,14 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest {
expectedOutputRowCount = 0)
}

test("group-by column with only null value") {
checkAggStats(
tableColumns = Seq("key22", "key32"),
tableRowCount = 6,
groupByColumns = Seq("key22", "key32"),
expectedOutputRowCount = nameToColInfo("key22")._2.distinctCount.get)
}

test("non-cbo estimation") {
val attributes = Seq("key12").map(nameToAttr)
val child = StatsTestPlan(
Expand Down