-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
number table apply top-n (order-by and limit) #2665
Conversation
Thanks for the contribution! Please review the labels and make any necessary changes. |
Codecov Report
@@ Coverage Diff @@
## main #2665 +/- ##
=====================================
Coverage 69% 69%
=====================================
Files 608 608
Lines 32513 32544 +31
=====================================
+ Hits 22509 22571 +62
+ Misses 10004 9973 -31
Continue to review full report at Codecov.
|
@@ -108,6 +118,11 @@ impl NumbersStream { | |||
|
|||
let series = DFUInt64Array::new_from_aligned_vec(av).into_series(); | |||
let block = DataBlock::create_by_array(self.schema.clone(), vec![series]); | |||
if !self.sort_columns_descriptions.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not help to block generate.
select number from numbers(1000) order by number desc limit 3;
SortPartialTransform
already did the limit improvement for us.
We should push down the sort/limit into try_get_one_block
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not help to block generate.
select number from numbers(1000) order by number desc limit 3;
SortPartialTransform
already did the limit improvement for us.We should push down the sort/limit into
try_get_one_block
.
Sure will address accordingly.
It is actually inside of function try_get_one_block. I think at least one of the benefit is, it decreases the data block size to transmit, right ? Every thread just send limit number of rows, I think that is the point of top-n push-down. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least one of the benefit is, it decreases the data block size to transmit, right
Yes, currently it benefits the data transfer to other nodes. It saves io mostly and it's better than previous.
But if you cut to limit the block, it can save io & CPU both, it's even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number generation in try_get_block is in sequence, so it's easy to apply sort & limit pushdown optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other functions like: order by number + 1
, order by (number * 3)
, this is related to #2343.
In this feature, we just make it simple using column name match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, the monotonic check is something I thought about. And I thought the logical would be implemented in the DataBlock sort, that is why I chose use DataBlock sort, instead of jut changing DataRange in the PR.
Thanks for clarifying, will address accordingly.
BTW, @sundy-li @BohuTANG , someone with investing background sent me a message, asking about the creator of this project. I forwarded the message to you in the Slack, could you take a look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for later, sure :)
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
Wait for another reviewer approval |
I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/
Summary
When order-by and limit are set, we just return top-n rows for every scan part.
Changelog
Related Issues
Fixes #2617
Test Plan
Unit Tests
Stateless Tests