You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AQE has an optimization rule: EliminateJoinToEmptyRelation that we are seeing in NDS q16. The query on the GPU takes roughly 40s to run in one of our test environments, but it takes the CPU 14s to run. This is because most of the query goes away and is replaced by a LocalScan <empty> given EliminateJoinToEmptyRelation on a broadcast exchange that produces 0 rows.
AQE performs this optimization for inner join, left semi, or a specific anti join (ExtraSingleColumnNullAwareAntiJoin). It looks at the children of the join and actively resolves the broadcast.relationFuture member. If it returns EmptyHashedRelation it goes ahead and eliminates this join.
The relationFuture is defined in the plugin, but we do not emit an EmptyHashedRelation when we have 0 rows in the broadcast. If I make this change, AQE eliminates most of q16 and now it executes in ~7.5 seconds (or roughly 2x faster than the CPU).
What I do not know yet is what else can break with this change. I did run all of NDS and none of the queries failed, but that's just one example. I think we need to understand better if EmptyHashedRelation is OK to have around in a GPU plan.
The text was updated successfully, but these errors were encountered:
@abellina can you post a patch so I can better understand what change you are making, and then we can figure out exactly what it would take to support this.
@revans2 here's the patch that I have so far: abellina@0c23d92. There could be a leak with the result, but the overall idea is here. There is the case of the identity broadcast, and I haven't looked into that one yet.
The only place I would change anything else is where we read the broadcast.
It is expecting a SerializeConcatHostBuffersDeserializeBatch and has it hard coded. I would fetch the broadcast and then check the type. If it is empty, then we can create an empty batch instead and still get the same answer. It also might be good to check for empty batches being returned from the child execution.
AQE has an optimization rule:
EliminateJoinToEmptyRelation
that we are seeing in NDS q16. The query on the GPU takes roughly 40s to run in one of our test environments, but it takes the CPU 14s to run. This is because most of the query goes away and is replaced by aLocalScan <empty>
givenEliminateJoinToEmptyRelation
on a broadcast exchange that produces 0 rows.AQE performs this optimization for inner join, left semi, or a specific anti join (
ExtraSingleColumnNullAwareAntiJoin
). It looks at the children of the join and actively resolves thebroadcast.relationFuture
member. If it returnsEmptyHashedRelation
it goes ahead and eliminates this join.The
relationFuture
is defined in the plugin, but we do not emit anEmptyHashedRelation
when we have 0 rows in the broadcast. If I make this change, AQE eliminates most of q16 and now it executes in ~7.5 seconds (or roughly 2x faster than the CPU).What I do not know yet is what else can break with this change. I did run all of NDS and none of the queries failed, but that's just one example. I think we need to understand better if
EmptyHashedRelation
is OK to have around in a GPU plan.The text was updated successfully, but these errors were encountered: