Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Jul 16, 2020

What changes were proposed in this pull request?

This PR is related to #26656.
#26656 only support use FILTER clause on aggregate expression without DISTINCT.
This PR will enhance this feature when one or more DISTINCT aggregate expressions which allows the use of the FILTER clause.
Such as:

select sum(distinct id) filter (where sex = 'man') from student;
select class_id, sum(distinct id) filter (where sex = 'man') from student group by class_id;
select count(id) filter (where class_id = 1), sum(distinct id) filter (where sex = 'man') from student;
select class_id, count(id) filter (where class_id = 1), sum(distinct id) filter (where sex = 'man') from student group by class_id;
select sum(distinct id), sum(distinct id) filter (where sex = 'man') from student;
select class_id, sum(distinct id), sum(distinct id) filter (where sex = 'man') from student group by class_id;
select class_id, count(id), count(id) filter (where class_id = 1), sum(distinct id), sum(distinct id) filter (where sex = 'man') from student group by class_id;

Note:
In #26656, we use AggregationIterator to treat the filter conditions of aggregate expr. This is good because we can evaluate filter in first aggregate locally.
But AggregationIterator only support single DISTINCT aggregate with filter clause.
So, this PR uses Project to project the filter clause as new generated attribute(e.g. _gen_attr_0) and ensure the evaluation at local.

Why are the changes needed?

Spark SQL only support use FILTER clause on aggregate expression without DISTINCT.
This PR support Filter expression allows simultaneous use of DISTINCT

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exists and new UT

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125985 has finished for PR 29135 at commit 4f9c7e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126016 has finished for PR 29135 at commit fefbce0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* if ('id > 1) 'cat2 else null,
* cast('value as bigint),
* if ('key = "a") cast('value as bigint) else null]
* output = ['key, '_gen_attr_1, '_gen_attr_2, '_gen_attr_3, '_gen_attr_4])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cat1 is not related to the filter, why do we change its name to _gen_attr_1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convenience and unification, we always alias the column, even if there is no filter.

// aggregate function. For example,
// 1.for AVG(DISTINCT value) GROUP BY key, the grouping expressions will be [key, value].
// 2.for AVG (DISTINCT value) Filter (WHERE age > 20) GROUP BY key, the grouping expression
// will be [key, value, age].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVG (DISTINCT value) Filter (WHERE age > 20) this will be rewritten as AVG (DISTINCT _gen_attr) Filter (WHERE _gen_attr is not null). So here we should group by key and _gen_attr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your remind.

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126036 has finished for PR 29135 at commit 202a454.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126050 has finished for PR 29135 at commit a2c842e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126052 has finished for PR 29135 at commit 7127744.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126059 has finished for PR 29135 at commit 7127744.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126082 has finished for PR 29135 at commit 7127744.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126084 has finished for PR 29135 at commit 2253499.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126088 has finished for PR 29135 at commit 7159582.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126089 has finished for PR 29135 at commit ba7c3a4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126100 has finished for PR 29135 at commit ba7c3a4.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126109 has finished for PR 29135 at commit ba7c3a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer beliefer closed this Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants