-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enchement](mc)Optimize reading of maxcompute partition tables. #45148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
| } | ||
|
|
||
| public void setException(UserException e) { | ||
| public synchronized void setException(UserException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using synchronized ? add comment in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is no need to add synchronized.
| CompletableFuture.runAsync(() -> { | ||
| try { | ||
| TableBatchReadSession tableBatchReadSession = | ||
| createTableBatchReadSession(requiredBatchPartitionSpecs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in this new implementation, you create read session for each partition?
Is create read session a heavy operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create read session for a batch of partition . createTableBatchReadSession have network io , and increasing the number of partitions will cause network io to be very slow. I tested 1500 partitions and it took about 13 seconds.
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
…he#45148) Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
…) (#45246) bp #45148 ### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
What problem does this PR solve?
Problem Summary:
Optimize reading of maxcompute partition tables:
num_partitions_in_batch_mode.mc.split_cross_partition. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug.-Darrow.enable_null_check_for_get=falseto be jvm to improve the efficiency of mc arrow data conversion.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)