Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: eliminate aggregation with distinct #16581

Closed
SeaRise opened this issue Apr 19, 2020 · 18 comments · Fixed by #39973 or pingcap/tipb#283
Closed

planner: eliminate aggregation with distinct #16581

SeaRise opened this issue Apr 19, 2020 · 18 comments · Fixed by #39973 or pingcap/tipb#283
Assignees
Labels
challenge-program sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@SeaRise
Copy link
Contributor

SeaRise commented Apr 19, 2020

Description

for table t (a int, b int, key int)

(10, 20, 10)
(10, 10, 10)
(20, 10, 13)
select count(distinct a), sum(b) from t group by key

==>

select count(a), sum(sum_b)
from (
	select a, sum(b) as sum_b, gid from (
		Expand(
			 projections = 
					[(a, null, key, 0),
                                         (null, b, key, 1)
					], 
			 t)
	)
	group by a, gid, key
)
group by key

Expand is extand t to t_extend(a int, b int, key int, gid int)

(10, null, 10, 0)
(null, 20, 10, 1)

(10, null, 10, 0)
(null, 10, 10, 1)

(20, null, 13, 0)
(null, 10, 13, 1)

spark sql use a operator Extend to extend t to t_extend.
we can use sql as follows to extend

	mem_table(gid int)
        (0)
	(1)
	
select if((gid = 0) a else null), if ((gid = 1) b else null), gid from t join mem_table

or implement a more efficient operator like Extend to extend t to t_extend.

Score

1200

SIG slack channel(must):

Contact us in channel #sig-planner of TiDB Community

Mentor(must)

@winoros

Recommended Skills:

  • Relational algebra
@SeaRise SeaRise added the type/enhancement The issue or PR belongs to an enhancement. label Apr 19, 2020
@zz-jason zz-jason added the sig/planner SIG: Planner label Apr 26, 2020
@iontang
Copy link
Contributor

iontang commented Oct 3, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@william0423 You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Oct 10, 2020
@iontang
Copy link
Contributor

iontang commented Oct 11, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@william0423 You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Oct 18, 2020
@pingyu
Copy link
Contributor

pingyu commented Nov 10, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@pingyu You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Nov 17, 2020
@pingyu
Copy link
Contributor

pingyu commented Nov 17, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@pingyu You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Nov 24, 2020
@hidehalo
Copy link
Contributor

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@hidehalo You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Dec 6, 2020
@Enochack
Copy link

Enochack commented Mar 23, 2021

Sorry I didn't get the point. Why expand the table? I found the following sql

select key, count(tmp.a), sum(tmp.sum_b)
from (
  select key, a, sum(b) as sum_b from t
  group by key, a
) as tmp
group by key

has an equivalent semantic and is more likely to have better performance.
However, if we add an expand operator like Spark SQL does, we could use it to implement with rollup and with cube clauses.

@Tangruilin
Copy link
Contributor

I'm interested in this problem, was it solved by others? /cc @winoros

@Tangruilin
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge-program sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
9 participants