Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number table apply top-n (order-by and limit) #2665

Merged
merged 2 commits into from
Nov 6, 2021

Conversation

junli1026
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

Summary

When order-by and limit are set, we just return top-n rows for every scan part.

Changelog

  • Improvement

Related Issues

Fixes #2617

Test Plan

Unit Tests

Stateless Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2021

Codecov Report

Merging #2665 (d95fefe) into main (48e4d79) will increase coverage by 0%.
The diff coverage is 47%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2665   +/-   ##
=====================================
  Coverage     69%     69%           
=====================================
  Files        608     608           
  Lines      32513   32544   +31     
=====================================
+ Hits       22509   22571   +62     
+ Misses     10004    9973   -31     
Impacted Files Coverage Δ
query/src/datasources/table_func/numbers_stream.rs 78% <33%> (-17%) ⬇️
query/src/datasources/table_func/numbers_table.rs 76% <69%> (-1%) ⬇️
common/management/src/namespace/namespace_mgr.rs 77% <0%> (-3%) ⬇️
metasrv/src/meta_service/raftmeta.rs 89% <0%> (-1%) ⬇️
metasrv/src/meta_service/meta_service_impl.rs 71% <0%> (+1%) ⬆️
cli/src/error.rs 27% <0%> (+3%) ⬆️
metasrv/src/api/http_service_test.rs 68% <0%> (+4%) ⬆️
query/src/common/mod.rs 85% <0%> (+14%) ⬆️
metasrv/src/api/http_service.rs 76% <0%> (+67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48e4d79...d95fefe. Read the comment docs.

@BohuTANG BohuTANG requested a review from sundy-li November 5, 2021 05:23
@@ -108,6 +118,11 @@ impl NumbersStream {

let series = DFUInt64Array::new_from_aligned_vec(av).into_series();
let block = DataBlock::create_by_array(self.schema.clone(), vec![series]);
if !self.sort_columns_descriptions.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not help to block generate.

select number from numbers(1000) order by number desc limit 3;

SortPartialTransform already did the limit improvement for us.

We should push down the sort/limit into try_get_one_block.

Copy link
Contributor Author

@junli1026 junli1026 Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not help to block generate.

select number from numbers(1000) order by number desc limit 3;

SortPartialTransform already did the limit improvement for us.

We should push down the sort/limit into try_get_one_block.

Sure will address accordingly.

It is actually inside of function try_get_one_block. I think at least one of the benefit is, it decreases the data block size to transmit, right ? Every thread just send limit number of rows, I think that is the point of top-n push-down. What do you think ?

Copy link
Member

@sundy-li sundy-li Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least one of the benefit is, it decreases the data block size to transmit, right

Yes, currently it benefits the data transfer to other nodes. It saves io mostly and it's better than previous.

But if you cut to limit the block, it can save io & CPU both, it's even better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number generation in try_get_block is in sequence, so it's easy to apply sort & limit pushdown optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other functions like: order by number + 1, order by (number * 3), this is related to #2343.

In this feature, we just make it simple using column name match.

Copy link
Contributor Author

@junli1026 junli1026 Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, the monotonic check is something I thought about. And I thought the logical would be implemented in the DataBlock sort, that is why I chose use DataBlock sort, instead of jut changing DataRange in the PR.
Thanks for clarifying, will address accordingly.

BTW, @sundy-li @BohuTANG , someone with investing background sent me a message, asking about the creator of this project. I forwarded the message to you in the Slack, could you take a look ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for later, sure :)
Thank you

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@databend-bot
Copy link
Member

Wait for another reviewer approval

@sundy-li sundy-li merged commit d13042f into databendlabs:main Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

number table apply optimizer's limit and order by
5 participants