Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion(optimizer): do not include the primary key of max-one-row side in pk derivation of Inner(?) Join #5764

Open
BugenZhao opened this issue Oct 10, 2022 · 7 comments
Assignees
Labels
component/optimizer Query optimization. component/streaming Stream processing related issue.

Comments

@BugenZhao
Copy link
Member

If we write an inner join whose one side is SimpleAgg, the primary key of output only contains the primary key of the other side since SimpleAgg has no primary key.

Things can actually be more general, as every max-one-row Plan Node can adapt this. This will benefit DynamicFilter which requires no output of the inner side.

@BugenZhao BugenZhao added component/streaming Stream processing related issue. component/optimizer Query optimization. labels Oct 10, 2022
@github-actions github-actions bot added this to the release-0.1.14 milestone Oct 10, 2022
@BugenZhao BugenZhao changed the title discussion: do not include the primary key of max-one-row side in pk derivation of Inner(?) Join discussion(optimizer): do not include the primary key of max-one-row side in pk derivation of Inner(?) Join Oct 10, 2022
@chenzl25
Copy link
Contributor

For streaming query, max-one-row would always appear together with correlated sub-query which finally causes the aggregation becoming not Simple or more specifically with group by.

@chenzl25
Copy link
Contributor

For the case of uncorrelated sub-query, it will produce a nested-loop join which is unsupported right now.

@BugenZhao
Copy link
Member Author

For the case of uncorrelated sub-query, it will produce a nested-loop join which is unsupported right now.

I think some of them can be supported with DynamicFilter?

@chenzl25
Copy link
Contributor

For the case of uncorrelated sub-query, it will produce a nested-loop join which is unsupported right now.

I think some of them can be supported with DynamicFilter?

Yes, you really remind me some of special nested-loop join can be transformed into dynamic filter. For dynamic filter, I think it's pk is the same with its left input, so we have already done?

@BugenZhao
Copy link
Member Author

Yes. The StreamDynamicFilter is created when calling LogicalJoin::to_stream after we rewrite the plan to add primary keys, so even if we don't want the columns from the right side, its primary key is still added so the precondition of dynamic filter will be broken. That's one of the reasons for our only supporting simple agg as the inner side currently.

@chenzl25
Copy link
Contributor

Yes. The StreamDynamicFilter is created when calling LogicalJoin::to_stream after we rewrite the plan to add primary keys, so even if we don't want the columns from the right side, its primary key is still added so the precondition of dynamic filter will be broken. That's one of the reasons for our only supporting simple agg as the inner side currently.

Got it.

@BugenZhao
Copy link
Member Author

BugenZhao commented Oct 11, 2022

I've had a try by hacking the pk derivation of Join side with MaxOneRowVisitor, it seems not to work since we'll have an intermediate state of LogicalJoin with two Stream plan nodes as the input. But the MaxOneRowVisitor is only implemented on logical plan nodes. So the hack of output indices becomes an absence of that column when converting to the stream plan, which further lead to failure of deriving the primary key of the mview.

A trivial solution could be to repeat (or forward) the implementation to each stream plan node, but it sounds too verbose. Per some offline discussion with @st1page, the best solution might be #2279, after we finish the refactor of the optimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/optimizer Query optimization. component/streaming Stream processing related issue.
Projects
None yet
Development

No branches or pull requests

2 participants