Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow EliminateJoinToEmptyRelation in GpuBroadcastExchangeExec #4134

Closed
abellina opened this issue Nov 17, 2021 · 3 comments · Fixed by #4256
Closed

[FEA] Allow EliminateJoinToEmptyRelation in GpuBroadcastExchangeExec #4134

abellina opened this issue Nov 17, 2021 · 3 comments · Fixed by #4256
Assignees
Labels
performance A performance related task/issue

Comments

@abellina
Copy link
Collaborator

abellina commented Nov 17, 2021

AQE has an optimization rule: EliminateJoinToEmptyRelation that we are seeing in NDS q16. The query on the GPU takes roughly 40s to run in one of our test environments, but it takes the CPU 14s to run. This is because most of the query goes away and is replaced by a LocalScan <empty> given EliminateJoinToEmptyRelation on a broadcast exchange that produces 0 rows.

AQE performs this optimization for inner join, left semi, or a specific anti join (ExtraSingleColumnNullAwareAntiJoin). It looks at the children of the join and actively resolves the broadcast.relationFuture member. If it returns EmptyHashedRelation it goes ahead and eliminates this join.

The relationFuture is defined in the plugin, but we do not emit an EmptyHashedRelation when we have 0 rows in the broadcast. If I make this change, AQE eliminates most of q16 and now it executes in ~7.5 seconds (or roughly 2x faster than the CPU).

What I do not know yet is what else can break with this change. I did run all of NDS and none of the queries failed, but that's just one example. I think we need to understand better if EmptyHashedRelation is OK to have around in a GPU plan.

@abellina abellina added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 17, 2021
@abellina abellina self-assigned this Nov 17, 2021
@abellina abellina added the performance A performance related task/issue label Nov 17, 2021
@revans2
Copy link
Collaborator

revans2 commented Nov 17, 2021

@abellina can you post a patch so I can better understand what change you are making, and then we can figure out exactly what it would take to support this.

@abellina
Copy link
Collaborator Author

@revans2 here's the patch that I have so far: abellina@0c23d92. There could be a leak with the result, but the overall idea is here. There is the case of the identity broadcast, and I haven't looked into that one yet.

@revans2
Copy link
Collaborator

revans2 commented Nov 17, 2021

The only place I would change anything else is where we read the broadcast.

It is expecting a SerializeConcatHostBuffersDeserializeBatch and has it hard coded. I would fetch the broadcast and then check the type. If it is empty, then we can create an empty batch instead and still get the same answer. It also might be good to check for empty batches being returned from the child execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
3 participants