Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): use group key as stream key for max-one-row GroupTopN #9082

Merged
merged 6 commits into from
Apr 12, 2023

Conversation

xx01cyx
Copy link
Contributor

@xx01cyx xx01cyx commented Apr 10, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We can use the group key as the stream key for GroupTopN when LIMIT is 1 without WITH TIES because there will be at most one record for each value of the group key.

This is an optimization for the optimizer. At the same time, #9016 depends on this PR for a correct plan.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Comment on lines -119 to -121
/// Infers the state table catalog for [`StreamTopN`] and [`StreamGroupTopN`].
pub fn infer_internal_table_catalog(&self, vnode_col_idx: Option<usize>) -> TableCatalog {
self.core
.infer_internal_table_catalog(&self.base, vnode_col_idx)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because LogicalTopN shouldn't infer table catalog for stream, and this method is not used anywhere.

Comment on lines -1119 to +1123
StreamMaterialize { columns: [auction, bidder, price, channel, url, date_time, extra, bid._row_id(hidden)], stream_key: [bid._row_id], pk_columns: [bid._row_id], pk_conflict: "NoCheck" }
└─StreamExchange { dist: HashShard(bid._row_id) }
└─StreamProject { exprs: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id] }
└─StreamAppendOnlyGroupTopN { order: "[bid.date_time DESC]", limit: 1, offset: 0, group_key: [1, 0] }
└─StreamExchange { dist: HashShard(bid.bidder, bid.auction) }
└─StreamTableScan { table: bid, columns: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id], pk: [bid._row_id], dist: UpstreamHashShard(bid._row_id) }
StreamMaterialize { columns: [auction, bidder, price, channel, url, date_time, extra, bid._row_id(hidden)], stream_key: [bidder, auction], pk_columns: [bidder, auction], pk_conflict: "NoCheck" }
└─StreamProject { exprs: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id] }
└─StreamAppendOnlyGroupTopN { order: "[bid.date_time DESC]", limit: 1, offset: 0, group_key: [1, 0] }
└─StreamExchange { dist: HashShard(bid.bidder, bid.auction) }
└─StreamTableScan { table: bid, columns: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id], pk: [bid._row_id], dist: UpstreamHashShard(bid._row_id) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this plan become worse... maybe we need to return multiple unique key in the planRef to make the parent PlanNode choose a small one as the state table's pk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in fact, it reduce one unnecessary exchange

@xx01cyx xx01cyx changed the title feat(optimizer): use group key as stream key for GroupTopN when limit is 1 feat(optimizer): use group key as stream key for GroupTopN when limit is 1 without WITH TIES Apr 11, 2023
@xx01cyx xx01cyx marked this pull request as ready for review April 11, 2023 13:17
@xx01cyx xx01cyx requested a review from st1page April 12, 2023 02:33
@codecov
Copy link

codecov bot commented Apr 12, 2023

Codecov Report

Merging #9082 (055385f) into main (07cd8b7) will decrease coverage by 0.02%.
The diff coverage is 47.16%.

@@            Coverage Diff             @@
##             main    #9082      +/-   ##
==========================================
- Coverage   70.88%   70.87%   -0.02%     
==========================================
  Files        1197     1197              
  Lines      199054   199089      +35     
==========================================
+ Hits       141109   141113       +4     
- Misses      57945    57976      +31     
Flag Coverage Δ
rust 70.87% <47.16%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...c/frontend/src/optimizer/plan_node/logical_topn.rs 97.31% <ø> (+1.28%) ⬆️
src/frontend/src/optimizer/plan_node/stream.rs 14.22% <0.00%> (-0.40%) ⬇️
src/stream/src/from_proto/group_top_n.rs 0.00% <0.00%> (ø)
...rc/stream/src/from_proto/group_top_n_appendonly.rs 0.00% <0.00%> (ø)
.../frontend/src/optimizer/plan_node/generic/top_n.rs 88.63% <100.00%> (ø)
...ntend/src/optimizer/plan_node/stream_group_topn.rs 91.22% <100.00%> (+0.84%) ⬆️
...rc/frontend/src/optimizer/plan_node/stream_topn.rs 95.77% <100.00%> (+0.46%) ⬆️

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xx01cyx xx01cyx changed the title feat(optimizer): use group key as stream key for GroupTopN when limit is 1 without WITH TIES feat(optimizer): use group key as stream key for max-one-row GroupTopN Apr 12, 2023
Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this PR makes some little regression but in most of case it is optimization

Comment on lines -1119 to +1123
StreamMaterialize { columns: [auction, bidder, price, channel, url, date_time, extra, bid._row_id(hidden)], stream_key: [bid._row_id], pk_columns: [bid._row_id], pk_conflict: "NoCheck" }
└─StreamExchange { dist: HashShard(bid._row_id) }
└─StreamProject { exprs: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id] }
└─StreamAppendOnlyGroupTopN { order: "[bid.date_time DESC]", limit: 1, offset: 0, group_key: [1, 0] }
└─StreamExchange { dist: HashShard(bid.bidder, bid.auction) }
└─StreamTableScan { table: bid, columns: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id], pk: [bid._row_id], dist: UpstreamHashShard(bid._row_id) }
StreamMaterialize { columns: [auction, bidder, price, channel, url, date_time, extra, bid._row_id(hidden)], stream_key: [bidder, auction], pk_columns: [bidder, auction], pk_conflict: "NoCheck" }
└─StreamProject { exprs: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id] }
└─StreamAppendOnlyGroupTopN { order: "[bid.date_time DESC]", limit: 1, offset: 0, group_key: [1, 0] }
└─StreamExchange { dist: HashShard(bid.bidder, bid.auction) }
└─StreamTableScan { table: bid, columns: [bid.auction, bid.bidder, bid.price, bid.channel, bid.url, bid.date_time, bid.extra, bid._row_id], pk: [bid._row_id], dist: UpstreamHashShard(bid._row_id) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in fact, it reduce one unnecessary exchange

@xx01cyx xx01cyx added this pull request to the merge queue Apr 12, 2023
Merged via the queue into main with commit 83a66ac Apr 12, 2023
@xx01cyx xx01cyx deleted the cyx/optimize-topn-pk branch April 12, 2023 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants