Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce dev-docs #14346

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Introduce dev-docs #14346

wants to merge 4 commits into from

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Oct 31, 2024

This PR adds a template for dev documentation.

I've recently read the Velox Developer Guide and I really think that it would be super useful to have something like that. We are far away from it, but this PR includes a couple of documents explaining Multi-stage query engine and all the machinery required to write that doc using MkDocs as well as some tips and codestyle.

The main thing this is not doing is to actually publish these pages. It should be trivial to publish this documentation in GitHub Pages, but before doing that I think it is better to see if other committers actually like the proposed mechanism.

This can also be a first step on a possible migration of the public user documentation into something that is easier to write for committers.

@gortiz
Copy link
Contributor Author

gortiz commented Oct 31, 2024

cc @bziobrowski

@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.69%. Comparing base (59551e4) to head (d167f62).
Report is 1268 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14346      +/-   ##
============================================
+ Coverage     61.75%   63.69%   +1.94%     
- Complexity      207     1470    +1263     
============================================
  Files          2436     2660     +224     
  Lines        133233   145837   +12604     
  Branches      20636    22313    +1677     
============================================
+ Hits          82274    92896   +10622     
- Misses        44911    46078    +1167     
- Partials       6048     6863     +815     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 34.16% <ø> (-27.55%) ⬇️
java-21 63.68% <ø> (+2.05%) ⬆️
skip-bytebuffers-false 63.68% <ø> (+1.93%) ⬆️
skip-bytebuffers-true 63.65% <ø> (+35.92%) ⬆️
temurin 63.69% <ø> (+1.94%) ⬆️
unittests 63.69% <ø> (+1.94%) ⬆️
unittests1 55.36% <ø> (+8.47%) ⬆️
unittests2 34.17% <ø> (+6.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gortiz gortiz mentioned this pull request Nov 6, 2024
Once the query is validated, the `SqlNode` is converted to a `RelNode` using `QueryEnvironment.toRelation` method.
This is done by the `SqlToRelConverter` class, which is a Calcite class that converts `SqlNode`s to `RelNode`s.

Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language.
Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language or its syntax.
The `RelNode`s represent the relational algebra.

If Apache Pinot were following the Calcite architecture, this phase would optimize the logical `RelNode`s into
physical `RelNode`s.
But Pinot just partially follows this model.
During optimization apply the rules defined in `PinotQueryRuleSets`, which transform the `RelNode`s in several ways:
Copy link
Contributor

@bziobrowski bziobrowski Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite start of sentence above to :
During optimization it applies the rules
?

the plans is in `pinot-common/src/main/proto/plan.proto`.

## Multi-stage operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to include name of the class responsible for the actions below.

## Multi-stage operators

The plan in protobuf format is received by the server and deserialized into `Worker.StagePlan`, generated by protobuf.
These objects are then transformed back into `PlanNode`s using `QueryPlanSerDeUtils` and then transformed into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MSQ QueryRunner the org.apache.pinot.query.runtime.QueryRunner class ?
What about SSQE QueryRunner ? IJ shows QueryRunner in pinot-tools but that seems to be something different.
Anyway - if simple name is the same then it'd be good to add relevant packages names somewhere .

For example joins and window functions are not supported by SSQ.
See `PlanNodeToOpChain` to learn more about the SSQ boundary.
This subplan is then transformed into a single `LeafStageTransferableBlockOperator`, which uses SSQ to execute that
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.
part of the query. This is done mainly in order to not have to rewrite all the optimizations we have in SSQ.

or

Suggested change
part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.
part of the query. This is done mainly in order to avoid having to rewrite all the optimizations we have in SSQ.

@@ -0,0 +1,48 @@
# Multi stage query execution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it point to tree-lifecycle.md ?

@@ -0,0 +1,48 @@
# Multi stage query execution
Copy link
Contributor

@bziobrowski bziobrowski Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to:

  • decide on a single term and acronym per concept, e.g. multi stage query engine/MSQE vs multi stage query/MSQ, single stage query/SSQ vs single stage query engine/SSQE
  • define them somewhere
    At the moment they're used assuming reader knows what they mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right. We need to more discipline to use always the same acronym.

I don't know if you have a chance to run mkdocs, but this PR includes a list of abbreviations that are automatically used by mkdocs to generate this tooltips:

image

On hover:
image

With your comment I just found that the abbreviations are incorrectly indicated as [MQS] instead of [MSQ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants