New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Introduce dev-docs #14346

Open

gortiz wants to merge 4 commits into apache:master from gortiz:dev-doc

+577 −1

Contributor

gortiz commented Oct 31, 2024

This PR adds a template for dev documentation.

I've recently read the Velox Developer Guide and I really think that it would be super useful to have something like that. We are far away from it, but this PR includes a couple of documents explaining Multi-stage query engine and all the machinery required to write that doc using MkDocs as well as some tips and codestyle.

The main thing this is not doing is to actually publish these pages. It should be trivial to publish this documentation in GitHub Pages, but before doing that I think it is better to see if other committers actually like the proposed mechanism.

This can also be a first step on a possible migration of the public user documentation into something that is easier to write for committers.


          first commit of dev docs

d167f62

gortiz force-pushed the dev-doc branch from 3cbd638 to d167f62 Compare

October 31, 2024 15:43

gortiz requested review from Jackie-Jiang, npawar and yashmayya

October 31, 2024 15:44

Contributor Author

gortiz commented Oct 31, 2024

cc @bziobrowski

codecov-commenter commented Oct 31, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.69%. Comparing base (59551e4) to head (d167f62).
Report is 1268 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14346      +/-   ##
============================================
+ Coverage     61.75%   63.69%   +1.94%     
- Complexity      207     1470    +1263     
============================================
  Files          2436     2660     +224     
  Lines        133233   145837   +12604     
  Branches      20636    22313    +1677     
============================================
+ Hits          82274    92896   +10622     
- Misses        44911    46078    +1167     
- Partials       6048     6863     +815

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`34.16% <ø> (-27.55%)`	⬇️
java-21	`63.68% <ø> (+2.05%)`	⬆️
skip-bytebuffers-false	`63.68% <ø> (+1.93%)`	⬆️
skip-bytebuffers-true	`63.65% <ø> (+35.92%)`	⬆️
temurin	`63.69% <ø> (+1.94%)`	⬆️
unittests	`63.69% <ø> (+1.94%)`	⬆️
unittests1	`55.36% <ø> (+8.47%)`	⬆️
unittests2	`34.17% <ø> (+6.44%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jackie-Jiang added the documentation label

gortiz added 2 commits

November 6, 2024 12:33


          Fix broken link in execution.md

1efe4c4


          Add a page for timestamp indexes

gortiz mentioned this pull request

Pinot dev docs #14401

Open

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/tree-lifecycle.md

+              Once the query is validated, the `SqlNode` is converted to a `RelNode` using `QueryEnvironment.toRelation` method.
+              This is done by the `SqlToRelConverter` class, which is a Calcite class that converts `SqlNode`s to `RelNode`s.
+              Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language.

Contributor

bziobrowski Nov 22, 2024

Suggested change

      
            Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language.
          
            Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language or its syntax.
          
            The `RelNode`s represent the relational algebra.

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/tree-lifecycle.md

+              If Apache Pinot were following the Calcite architecture, this phase would optimize the logical `RelNode`s into
+              physical `RelNode`s.
+              But Pinot just partially follows this model.
+              During optimization apply the rules defined in `PinotQueryRuleSets`, which transform the `RelNode`s in several ways:

Contributor

bziobrowski Nov 22, 2024 •

edited

Loading

Maybe rewrite start of sentence above to :
During optimization it applies the rules
?

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/tree-lifecycle.md

		the plans is in `pinot-common/src/main/proto/plan.proto`.

		## Multi-stage operators

Contributor

bziobrowski Nov 22, 2024

I think it'd be good to include name of the class responsible for the actions below.

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/tree-lifecycle.md

+              ## Multi-stage operators
+              The plan in protobuf format is received by the server and deserialized into `Worker.StagePlan`, generated by protobuf.
+              These objects are then transformed back into `PlanNode`s using `QueryPlanSerDeUtils` and then transformed into

Contributor

bziobrowski Nov 22, 2024

Is MSQ QueryRunner the org.apache.pinot.query.runtime.QueryRunner class ?
What about SSQE QueryRunner ? IJ shows QueryRunner in pinot-tools but that seems to be something different.
Anyway - if simple name is the same then it'd be good to add relevant packages names somewhere .

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/tree-lifecycle.md

+              For example joins and window functions are not supported by SSQ.
+              See `PlanNodeToOpChain` to learn more about the SSQ boundary.
+              This subplan is then transformed into a single `LeafStageTransferableBlockOperator`, which uses SSQ to execute that
+              part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.

Contributor

bziobrowski Nov 22, 2024

Suggested change

      
            part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.
          
            part of the query. This is done mainly in order to not have to rewrite all the optimizations we have in SSQ.

or

Suggested change

      
            part of the query. This is done mainly in order to do not have to rewrite all the optimizations we have in SSQ.
          
            part of the query. This is done mainly in order to avoid having to rewrite all the optimizations we have in SSQ.

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/execution.md

		@@ -0,0 +1,48 @@
		# Multi stage query execution

Contributor

bziobrowski Nov 22, 2024

Shouldn't it point to tree-lifecycle.md ?

bziobrowski reviewed

View reviewed changes

docs/dev/query/msq/execution.md

		@@ -0,0 +1,48 @@
		# Multi stage query execution

Contributor

bziobrowski Nov 22, 2024 •

edited

Loading

I think it would be good to:

decide on a single term and acronym per concept, e.g. multi stage query engine/MSQE vs multi stage query/MSQ, single stage query/SSQ vs single stage query engine/SSQE
define them somewhere
At the moment they're used assuming reader knows what they mean.

Contributor Author

gortiz Nov 22, 2024

You are totally right. We need to more discipline to use always the same acronym.

I don't know if you have a chance to run mkdocs, but this PR includes a list of abbreviations that are automatically used by mkdocs to generate this tooltips:

On hover:

With your comment I just found that the abbreviations are incorrectly indicated as [MQS] instead of [MSQ]


          Fixed abbreviations

a278480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels