-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce dev-docs #14346
base: master
Are you sure you want to change the base?
Introduce dev-docs #14346
Conversation
cc @bziobrowski |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14346 +/- ##
============================================
+ Coverage 61.75% 63.69% +1.94%
- Complexity 207 1470 +1263
============================================
Files 2436 2660 +224
Lines 133233 145837 +12604
Branches 20636 22313 +1677
============================================
+ Hits 82274 92896 +10622
- Misses 44911 46078 +1167
- Partials 6048 6863 +815
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Once the query is validated, the `SqlNode` is converted to a `RelNode` using `QueryEnvironment.toRelation` method. | ||
This is done by the `SqlToRelConverter` class, which is a Calcite class that converts `SqlNode`s to `RelNode`s. | ||
|
||
Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language. | |
Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language or its syntax. | |
The `RelNode`s represent the relational algebra. |
If Apache Pinot were following the Calcite architecture, this phase would optimize the logical `RelNode`s into | ||
physical `RelNode`s. | ||
But Pinot just partially follows this model. | ||
During optimization apply the rules defined in `PinotQueryRuleSets`, which transform the `RelNode`s in several ways: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite start of sentence above to :
During optimization it applies the rules
?
the plans is in `pinot-common/src/main/proto/plan.proto`. | ||
|
||
## Multi-stage operators | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to include name of the class responsible for the actions below.
## Multi-stage operators | ||
|
||
The plan in protobuf format is received by the server and deserialized into `Worker.StagePlan`, generated by protobuf. | ||
These objects are then transformed back into `PlanNode`s using `QueryPlanSerDeUtils` and then transformed into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MSQ QueryRunner the org.apache.pinot.query.runtime.QueryRunner class ?
What about SSQE QueryRunner ? IJ shows QueryRunner in pinot-tools but that seems to be something different.
Anyway - if simple name is the same then it'd be good to add relevant packages names somewhere .
For example joins and window functions are not supported by SSQ. | ||
See `PlanNodeToOpChain` to learn more about the SSQ boundary. | ||
This subplan is then transformed into a single `LeafStageTransferableBlockOperator`, which uses SSQ to execute that | ||
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ. | |
part of the query. This is done mainly in order to not have to rewrite all the optimizations we have in SSQ. |
or
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ. | |
part of the query. This is done mainly in order to avoid having to rewrite all the optimizations we have in SSQ. |
@@ -0,0 +1,48 @@ | |||
# Multi stage query execution | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it point to tree-lifecycle.md
?
@@ -0,0 +1,48 @@ | |||
# Multi stage query execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to:
- decide on a single term and acronym per concept, e.g. multi stage query engine/MSQE vs multi stage query/MSQ, single stage query/SSQ vs single stage query engine/SSQE
- define them somewhere
At the moment they're used assuming reader knows what they mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right. We need to more discipline to use always the same acronym.
I don't know if you have a chance to run mkdocs, but this PR includes a list of abbreviations that are automatically used by mkdocs to generate this tooltips:
With your comment I just found that the abbreviations are incorrectly indicated as [MQS]
instead of [MSQ]
This PR adds a template for dev documentation.
I've recently read the Velox Developer Guide and I really think that it would be super useful to have something like that. We are far away from it, but this PR includes a couple of documents explaining Multi-stage query engine and all the machinery required to write that doc using MkDocs as well as some tips and codestyle.
The main thing this is not doing is to actually publish these pages. It should be trivial to publish this documentation in GitHub Pages, but before doing that I think it is better to see if other committers actually like the proposed mechanism.
This can also be a first step on a possible migration of the public user documentation into something that is easier to write for committers.