-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking: better plan for projection expression by using output indices #1922
Comments
TrackingColumn prune
Output Indices
|
Hmm, my guess is that it would be better to keep both the original schema and the new schema if we have output_indices not_none, so that we can keep prune_col's idempotency. |
IMO the schema will be changed as well. This is based on the idea of "hiding" projections inside these nodes. And add these projections back, when converting logical nodes to stream/batch nodes. I'm currently working on
Anyway, I think it is worth a discussion on whether to keep the schema or not. cc @st1page what's your idea? |
and for some operators such as hash_join, we can even implement the behaviour in the executor |
when the |
Background
in our project operator, there is 2 kind of expressions
Add(input_ref(0), input_ref(1)
input_ref
expression.and this issue will talk about how to reduce the projection expression in project operators.
for 2 reason
Solution
To eliminate these logical projection nodes we introduce
output_index
.col_prune v2
To preserve the information of column order, we introduce column prune v2. We should change the definition
fn prune_col(&self, required_cols: &FixedBitSet) -> PlanRef;
tofn prune_col(&self, required_cols: &[usize]) -> PlanRef;
. Therefore we can prune columns while reordering the columns.Output index
A field
output_index: Vec<usize>
will be added onLogicalAgg
,LogicalJoin
,LogicalHopWindow
, and any logical plan node which needs extra projection before. It will represent the operator's output column index based on our current schema of PlanNode.for example
in our current implementation, we will get a join plan node with schema
[t1.a, t1.b, t2.a, t2.b]
, and a project on the join with expressions[input_ref(2), input_ref(0)]
.And with the
output_index
there will only be a join plan node with output_index[2, 0]
Implementation considerations
When adding this field to the plan nodes, we should carefully re-think those already existing functions.
new()
o2i_mapping
and other similar functionsrewrite_with_input
,rewrite_for_stream
.The output_index is natural in our chunk-based vectorized query execution, so we can add the field on the proto and implement it in executors in the future. but now we can just add a stream/batch project node when converting the logical plan to a stream/batch plan.
The text was updated successfully, but these errors were encountered: