Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookup support push down topK #2992

Closed

Conversation

MMyheart
Copy link
Contributor

No description provided.

@HarrisChu HarrisChu added ready-for-testing PR: ready for the CI test and removed ready-for-testing PR: ready for the CI test labels Sep 30, 2021
@Shylock-Hg
Copy link
Contributor

Add test cases.

@MMyheart MMyheart force-pushed the optimize/topN_push_down_in_lookup branch from 950dff9 to afa18ac Compare October 12, 2021 08:50
@Sophie-Xie Sophie-Xie added the ready-for-testing PR: ready for the CI test label Oct 12, 2021
@@ -0,0 +1,125 @@
/* Copyright (c) 2020 vesoft inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 40 to 65
void adjustDown(size_t parent) {
size_t child = parent * 2 + 1;
size_t size = v_.size();
while (child < size) {
if (child + 1 < size && comparator_(v_[child], v_[child + 1])) {
child += 1;
}
if (!comparator_(v_[parent], v_[child])) {
return;
}
std::swap(v_[parent], v_[child]);
parent = child;
child = parent * 2 + 1;
}
}
void adjustUp(size_t child) {
size_t parent = (child - 1) >> 1;
while (0 != child) {
if (!comparator_(v_[parent], v_[child])) {
return;
}
std::swap(v_[parent], v_[child]);
child = parent;
parent = (child - 1) >> 1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this function means?
Is it any different from std::sort & std:: erase ?

Copy link
Contributor Author

@MMyheart MMyheart Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs to sort all data by std::sort, and this function gets topK better than std::sort

@bright-starry-sky
Copy link
Contributor

Good job. thanks for your contribution.
I have some ideas about this feature that we can discuss together.
For topK node, I think we'd better split it up into OrderNode and TopNode, they can be used for other queries , such as fetch and go statement. not just for lookup . WDYT?

@bright-starry-sky
Copy link
Contributor

Good job. thanks for your contribution. I have some ideas about this feature that we can discuss together. For topK node, I think we'd better split it up into OrderNode and TopNode, they can be used for other queries , such as fetch and go statement. not just for lookup . WDYT?

And it is compatible with Order-by

@Sophie-Xie Sophie-Xie added the community Source: who proposed the issue label Oct 13, 2021
@MMyheart
Copy link
Contributor Author

Good job. thanks for your contribution. I have some ideas about this feature that we can discuss together. For topK node, I think we'd better split it up into OrderNode and TopNode, they can be used for other queries , such as fetch and go statement. not just for lookup . WDYT?

Firstly, I agree to that topK node also should be used for other queries, not just for lookup. And it can be implement by next pr. Secondly, I don't think it make sense to split topK up into OrderNode and TopNode. TopK Node is designed to solve order by xx | limit x. For only limit x, it should be part of other query node and for only order by xx, there is no need to push down.

namespace storage {

template <class T>
class TopKHeap final {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could reuse the TokKHeap in graph layer, or merge it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, TopKHeap has been reused in IndexScanExecutor

@MMyheart MMyheart force-pushed the optimize/topN_push_down_in_lookup branch 2 times, most recently from 393e20e to a76065c Compare October 18, 2021 08:55
@MMyheart
Copy link
Contributor Author

Add test cases.

a test case has been added.
tests/tck/features/lookup/LookUpTopN.feature

@Shylock-Hg
Copy link
Contributor

Please check the failed case.

@Shylock-Hg
Copy link
Contributor

Please check the conflicts.

@MMyheart MMyheart force-pushed the optimize/topN_push_down_in_lookup branch 3 times, most recently from 45c48d8 to 115434d Compare October 22, 2021 07:11
| 5 | DataCollect | 9 | |
| 9 | Project | 10 | |
| 10 | TopN | 11 | |
| 11 | TagIndexFullScan | 0 | {"orderBy": {"pos": "0"}} |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should check orderBy and limit fields of TagIndexFullScan`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check orderBy and limit together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 45, there is limit check.
On line 27(here), there is orderBy check.
So, You can try to merge them together, thus , you can check orderBy and limit together.

@MMyheart MMyheart force-pushed the optimize/topN_push_down_in_lookup branch 2 times, most recently from af549ec to 27bdb7b Compare October 27, 2021 06:26
@MMyheart MMyheart force-pushed the optimize/topN_push_down_in_lookup branch from 27bdb7b to 5bee437 Compare October 27, 2021 06:35
@codecov-commenter
Copy link

Codecov Report

Merging #2992 (5bee437) into master (21a3ffb) will increase coverage by 0.09%.
The diff coverage is 90.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2992      +/-   ##
==========================================
+ Coverage   85.21%   85.31%   +0.09%     
==========================================
  Files        1295     1294       -1     
  Lines      118198   118308     +110     
==========================================
+ Hits       100728   100930     +202     
+ Misses      17470    17378      -92     
Impacted Files Coverage Δ
src/clients/storage/GraphStorageClient.h 100.00% <ø> (ø)
src/graph/executor/query/IndexScanExecutor.cpp 91.66% <ø> (+2.77%) ⬆️
src/graph/optimizer/rule/TopNRule.cpp 95.45% <ø> (-0.38%) ⬇️
src/storage/index/LookupBaseProcessor.h 100.00% <ø> (ø)
src/storage/index/LookupProcessor.cpp 85.54% <50.00%> (-0.88%) ⬇️
...rc/graph/optimizer/rule/PushLimitDownIndexRule.cpp 80.00% <80.00%> (ø)
src/graph/optimizer/rule/PushTopNDownIndexRule.cpp 83.83% <83.83%> (ø)
.../optimizer/rule/PushLimitSortDownIndexScanRule.cpp 96.29% <96.29%> (ø)
src/storage/exec/TopKNode.h 96.29% <96.29%> (ø)
src/clients/storage/GraphStorageClient.cpp 76.43% <100.00%> (-0.11%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21a3ffb...5bee437. Read the comment docs.

@@ -47,26 +47,6 @@ Feature: TopN rule
| 4 | Start | | |

Scenario: [1] fail to apply topn rule
When profiling query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I think limit offset count can be pushed down, so I delete these. Is there a problem in some scenarios?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I known, this pr just push topn to IndexScan, but the go statement implemented by GetNeighbors executor.

@@ -1639,21 +1639,21 @@ Feature: IntegerVid Go Sentence
When executing query:
"""
$a = GO FROM hash('Tony Parker') OVER like YIELD like._src as src, like._dst as dst;
GO 2 STEPS FROM $a.src OVER like YIELD $a.src as src, $a.dst, like._src AS like_src, like._dst AS like_dst
| ORDER BY $-.src,$-.like_src,$-.like_dst | OFFSET 1 LIMIT 2
GO 2 STEPS FROM $a.src OVER like YIELD $a.src as src, $a.dst as dst, like._src AS like_src, like._dst AS like_dst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why change tests of GO statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

As above, this results in a change in the sorting results of two equal values.
eg:
The value list is 2、1、3、2' and the previous sorting result may be 1、2、2'、3, in other words, there's no change in the relative order of 2 and 2'. But now, it may be 1、2'、2、3, although the result has no effect on the user, but test case will be failed. So I change this.

| 0 | Start | | |
When profiling query:
"""
LOOKUP ON player | ORDER BY $-.VertexID | Limit 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comments above, there is no need to test limit and orderBy check separately

@Shylock-Hg
Copy link
Contributor

Shylock-Hg commented Nov 17, 2021

Hello @MMyheart :
The implementation of IndexScan had refactored by volcano model, so you should push the record to TopK node one by one.

@Sophie-Xie
Copy link
Contributor

new PR #3499

@Sophie-Xie Sophie-Xie closed this Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants