Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform: plumb through OptimizerFeatures #25628

Merged

Conversation

aalexandrov
Copy link
Contributor

@aalexandrov aalexandrov commented Feb 29, 2024

Make OptimizerConfig available in all parts of the MIR optimizations defined in mz_transform (local and global).

Motivation

  • This PR adds a known-desirable feature.

Part of MaterializeInc/database-issues#7541.

Tips for reviewer

To follow what is happening it's best to review this one commit at a time.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (Relying on the large body of existing tests).
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • There are no user-facing behavior changes.

Sorry, something went wrong.

@aalexandrov aalexandrov requested a review from a team February 29, 2024 00:25
@aalexandrov aalexandrov requested a review from a team as a code owner February 29, 2024 00:25
@aalexandrov aalexandrov requested review from a team and ParkMyCar February 29, 2024 00:25
@aalexandrov aalexandrov marked this pull request as draft February 29, 2024 00:25
@aalexandrov aalexandrov self-assigned this Feb 29, 2024
Copy link

shepherdlybot bot commented Feb 29, 2024

Risk Score:81 / 100 Bug Hotspots:1 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability
  • (Required) QA Review
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The pull request poses a high risk, with a score of 81, potentially due to the average age of the files involved, the cognitive complexity within these files, and the changes in executable lines of code. Notably, historical data indicates that pull requests with these characteristics are 107% more likely to introduce bugs compared to the repository's baseline. Additionally, the pull request modifies one file that has recently seen a high number of bug fixes, which could further contribute to its risk level. While the repository's observed bug trend is decreasing, this pull request's risk factors warrant careful review.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/join_implementation.rs 93

Sorry, something went wrong.

@aalexandrov aalexandrov force-pushed the cluster_specific_optimization branch 12 times, most recently from 1685183 to 625420b Compare March 6, 2024 14:12
@aalexandrov aalexandrov force-pushed the cluster_specific_optimization branch 6 times, most recently from 5f623b4 to 0c91c00 Compare March 8, 2024 08:04
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a few comments, but still reviewing.

@@ -398,26 +459,23 @@ pub struct Optimizer {
impl Optimizer {
/// Builds a logical optimizer that only performs logical transformations.
#[deprecated = "Create an Optimize instance and call `optimize` instead."]
pub fn logical_optimizer(ctx: &crate::typecheck::SharedContext) -> Self {
// TODO: pass TransformCtx instead
pub fn logical_optimizer(ctx: &mut TransformCtx) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO is actually resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove this.

/// Transforms can use this field to communicate information outside the result plans.
pub dataflow_metainfo: &'a mut DataflowMetainfo,
pub df_meta: &'a mut DataflowMetainfo,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the doc comment in filter.rs? Or you could maybe even just delete the TransformCtx part of that comment, because it probably has limited usefulness but always annoyingly goes out of sync.

And the same in predicate_pushdown.rs. (typecheck_ctx is missing.)

/// stage.
///
/// Used to call [`dataflow::optimize_dataflow`].
pub fn global(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why not just put this code inside optimize_dataflow (and pass these same arguments to optimize_dataflow instead)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the sequence of commits there is an intermediate stage where actually things are as suggested by you, but then I am pulling the TransformCtx::global(...) call outside of optimize_dataflow into the Optimize implementations in `mz_adapter.

Mostly this is done for consistency between the local and the lobal pass, and because I don't want us to touch the signature of optimize_dataflow every time when the TransformCtx contents change—the Optimize implementations in mz_adapter have everything needed to construct a TransformCtx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

/// Used to call [`Optimizer::optimize`] on a
/// [`Optimizer::logical_optimizer`] in order to transform a stand-alone
/// [`MirRelationExpr`].
pub fn local(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: Why not put this into optimize_mir_local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above. Assume that in the next three months we want to touch the definition of TransformCtx three times—I think it's less effort to only modify the five call-sites in src/adapter/src/optimize or think about making the conversion part of some generic trait interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about it like that:

  • The Optimzie implementations in src/adapter/src/optimize are a high-level interface that abstracts over the end-to-end entire optimization pipeline for our various statement types. They each own a configuration as a config: OptimizerConfig field which is passed fully formed into the Optimizer::new(...) constructor calls.
  • The various parts of the pipeline sit behind a low-level API which Optimzie calls. Each primitive works by either calling a function or instantiating an object which contains a primitive-specific optimization. Examples are:
    • HirRelationExpr::lower(...) which is parameterized by a mz_sql::plan::lowering::Config.
    • Optimizer::<optimizer_type_ctor>(...) which is parameterized by a TransformCtx.
    • Optimizer::optimize(...) which is parameterized by a TransformCtx.
    • Plan::lower_dataflow(...) which is parameterized by an mz_compute_types::plan::lowering::Context`.

The patterns at the moment are not quite as uniform as I would like them to be, but the gist is that each API primitive wants to have it's own context type as a parameter, and the caller should be responsible in forming this paramter from it's own context.

Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aalexandrov aalexandrov marked this pull request as ready for review March 8, 2024 17:28
@aalexandrov
Copy link
Contributor Author

aalexandrov commented Mar 8, 2024

Thanks for the reviews. @teskje / @ggevay I'm certainly open to suggestions how to make this more ergonomic and further reduce the amount of boilerplate code for context forming. I went for regularity at the expense of some code duplication here, hoping that such ideas might emerge naturally in the next 1-2 months.

@aalexandrov aalexandrov changed the title [DNM]: OptimizerFeatures plumbing OptimizerFeatures plumbing into mz_transform Mar 8, 2024
@aalexandrov aalexandrov changed the title OptimizerFeatures plumbing into mz_transform transform: plumb through OptimizerFeatures Mar 8, 2024
@aalexandrov aalexandrov mentioned this pull request Mar 8, 2024
5 tasks
- Call `TransformCtx::global(...)` in `optimize_dataflow`.
- Pass the result as a `optimize_dataflow_relations` parameter.
- Make the `Optimize` implementations in `mz_adapter` responsible for
  constructing a `TransformCtx` instance.
- Pass that instance as a parameter to the `optimize_dataflow`
  function in `mz_transform`.
Pull the `TransformCtx::local(...)` call one level up from
`Optimizer::logical_optimizer` to the caller (`optimize_mir_local`).
- Make the `Optimize` implementations in `mz_adapter` responsible for
  constructing a `TransformCtx` instance.
- Pass that instance as a parameter to the `Optimizer::optimize`
  method in `mz_transform`.
Share the same `DataflowMetainfo` between the `TransformCtx::local(...)`
and the `TransformCtx::global(...)` calls.
@aalexandrov aalexandrov force-pushed the cluster_specific_optimization branch from 0c91c00 to d33bb48 Compare March 8, 2024 18:03
@aalexandrov aalexandrov enabled auto-merge March 8, 2024 18:05
@aalexandrov aalexandrov added A-optimization Area: query optimization and transformation A-compute Area: compute labels Mar 8, 2024
@aalexandrov aalexandrov merged commit 310850c into MaterializeInc:main Mar 8, 2024
72 of 73 checks passed
@aalexandrov aalexandrov deleted the cluster_specific_optimization branch March 8, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compute Area: compute A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants