Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH-186] Support RangePartitioning #524

Merged
merged 33 commits into from
Nov 22, 2022
Merged

Conversation

lgbo-ustc
Copy link
Contributor

@lgbo-ustc lgbo-ustc commented Nov 8, 2022

What changes were proposed in this pull request?

Implement a new NativePartitioning to support RangePartitioning.

what cases are supported

-- simple case
select * from t order by x;

-- mulit cols
select * from t order by x, y desc;

-- order by expressions
select * from t order by x + 1;

don't support order by a complex data type.

in case the backend has functions incompleted, supply option spark.gluten.sql.columnar.rangepartitioning to disable this columnar range partitioning.

How was this patch tested?

unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
test by CH[[189]]

@github-actions
Copy link

github-actions bot commented Nov 8, 2022

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

backends-clickhouse/pom.xml Outdated Show resolved Hide resolved
@zzcclp
Copy link
Contributor

zzcclp commented Nov 14, 2022

retest this please

@zzcclp zzcclp requested a review from rui-mo November 14, 2022 13:01
@lgbo-ustc
Copy link
Contributor Author

There is something wrong under aqe

@lgbo-ustc lgbo-ustc force-pushed the rangepartition branch 2 times, most recently from 7bb984b to e71ff7c Compare November 16, 2022 06:41
@lgbo-ustc
Copy link
Contributor Author

lgbo-ustc commented Nov 16, 2022

I disable running range partitioning in backend when it has complex expressions in the ordering keys.
1. there are more works to do in the backend to support calculate complex expressions in range partitioning
2. when enable AQE, adding projections for range partitioning shuffle brings some problems. But if don't do so, the expressions will be caculated twice in sort and range partitioning

Build a projection node and pass it to the native splitter, the native splitter will compute the expressions now.
This is may cause the expressions computed twice in sort and range partitioning, but we are not sure which overhead is higher that we compute the expressions twice or transfer results over the network.

@lgbo-ustc lgbo-ustc requested review from rui-mo and zzcclp and removed request for rui-mo November 18, 2022 02:26
@lgbo-ustc lgbo-ustc force-pushed the rangepartition branch 3 times, most recently from 0fd3af1 to 8617faf Compare November 18, 2022 09:49
@zhztheplayer zhztheplayer changed the title [CH-186] support RangePartitionoing [CH-186] support RangePartitioning Nov 22, 2022
@zhztheplayer zhztheplayer changed the title [CH-186] support RangePartitioning [CH-186] Support RangePartitioning Nov 22, 2022
@zzcclp zzcclp merged commit 2e84b96 into apache:main Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants