Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option for coalesce_batches physical optimization rule, make optional #2791

Merged
merged 5 commits into from
Jun 28, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jun 25, 2022

Which issue does this PR close?

Closes #2790

Rationale for this change

Give users control over optimization rules that are applied.

What changes are included in this PR?

  • Add new config options
  • Unrelated change to update the config docs generator to always sort by config key
  • Update the config docs

Are there any user-facing changes?

No

@andygrove andygrove self-assigned this Jun 25, 2022
@github-actions github-actions bot added the core Core DataFusion crate label Jun 25, 2022
for config in configs
.config_definitions
.iter()
.sorted_by_key(|c| c.key.clone())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the output deterministic

@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2022

Codecov Report

Merging #2791 (aaef2ee) into master (7c60412) will increase coverage by 0.08%.
The diff coverage is 97.10%.

❗ Current head aaef2ee differs from pull request most recent head 30f9f8d. Consider uploading reports for the commit 30f9f8d to get more accurate results

@@            Coverage Diff             @@
##           master    #2791      +/-   ##
==========================================
+ Coverage   85.14%   85.22%   +0.08%     
==========================================
  Files         273      274       +1     
  Lines       48248    48621     +373     
==========================================
+ Hits        41079    41436     +357     
- Misses       7169     7185      +16     
Impacted Files Coverage Δ
datafusion/core/src/datasource/view.rs 86.60% <ø> (ø)
datafusion/core/tests/custom_sources.rs 83.72% <ø> (ø)
datafusion/core/tests/sql/explain_analyze.rs 82.28% <ø> (ø)
datafusion/core/tests/sql/json.rs 46.87% <ø> (ø)
datafusion/core/tests/sql/projection.rs 96.36% <ø> (ø)
datafusion/core/tests/sql/udf.rs 100.00% <ø> (ø)
datafusion/core/tests/user_defined_plan.rs 87.79% <ø> (ø)
datafusion/expr/src/logical_plan/builder.rs 89.51% <ø> (ø)
...tafusion/optimizer/src/common_subexpr_eliminate.rs 94.11% <ø> (ø)
datafusion/optimizer/src/eliminate_filter.rs 100.00% <ø> (ø)
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c60412...30f9f8d. Read the comment docs.

@alamb alamb changed the title Add config option for coalesce_batches physical optimization rule Add config option for coalesce_batches physical optimization rule, make optional Jun 26, 2022
@@ -27,6 +28,13 @@ pub const OPT_FILTER_NULL_JOIN_KEYS: &str = "datafusion.optimizer.filter_null_jo
/// Configuration option "datafusion.execution.batch_size"
pub const OPT_BATCH_SIZE: &str = "datafusion.execution.batch_size";

/// Configuration option "datafusion.execution.coalesce_batches"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this new option framework is coming together ❤️

Arc::new(AggregateStatistics::new()),
Arc::new(HashBuildProbeOrder::new()),
];
if config.config_options.get_bool(OPT_COALESCE_BATCHES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be helpful to add a test to ensure the option to disable coalsce'ing batches doesn't get broken in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I will get to this in the next day or two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added tests for disabling the optimizer rule and also for setting a custom batch size.

@Dandandan Dandandan merged commit d5a9b74 into apache:master Jun 28, 2022
@andygrove andygrove deleted the config-coalesce-batches branch January 27, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add configuration option to enable/disalbe CoalesceBatchesExec
5 participants