You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this RFC is to consolidate all the different PPL execution engines usage of the PPL specification (ANTLR) and the query to AST construction into a single repository.
This repository will maintain the most up-to-date vocabulary and documentations and will be used as a reference for any downstream engine to use.
A single grammar location enables simpler and consistent way to evolve the language and moves the responsibility of updating the downstream engine on the engine implementing the spec rather than the grammar & language maintainers.
Implemented solution
Our goal is to remove the PPL grammar and AST tree structure from each of the downstream engines and consolidate into a single artifact that will be used by any existing or future physical execution engine.
The single responsibility of the execution engine would be to translate PPL's AST tree into that engine logical or physical plan (in case where that engine has no logical layer such as OpenSearch).
In Spark PPL use case for example, we implemented a CatalystQueryPlanVisitor PPL AST logical plan traverser that will travers the PPL AST tree to transform the PPL logical plan into Catalyst logical plan that will be submitted to Spark to generate the subsequent physical plan and execute the query.
Advantages of the selected approach:
Today each engine has to support its own version of PPL ANTLR and documentation. They tend to diverse one from the other due each engine specification.
Once these components would be extracted into a single location the dependency would be immutable and force the engine implementations to follow the grammar more closely and in the unique cases where divergence is needed - adding a distinct UDF to facilitate the difference in the actual grammar.
Advantages:
reuse of existing PPL code that is tested and documented in one location and released in its own artifact.
simplify development while relying on well known and structured codebase
long term support maintaining the grammar and documentation in a single location simplifies the long term support and language evolution .
single place of maintenance by reusing the PPL logical model which relies on ppl ANTLR parser, we can use a single repository to maintain and develop the PPL language without the need to constantly merge changes from upstream .
The following diagram shows the high level architecture of the selected implementation solution :
The logical Architecture show the next artifacts:
Libraries:
PPL ( the ppl core , protocol, parser & logical plan utils)
SQL ( the SQL core , protocol, parser - depends on PPL for using the logical plan utils)
Drivers:
PPL OpenSearch Driver (depends on OpenSearch core)
PPL Spark Driver (depends on Spark core)
PPL Prometheus Driver (directly translates PPL to PromQL )
SQL OpenSearch Driver (depends on OpenSearch core)
Task :
Extract PPL logical component outside the SQL plugin into a (none-plugin) library - publish library to maven
Separate the PPL / SQL drivers inside the OpenSearch PPL client to better distinguish
Create a thin PPL client capable of interaction with the PPL Driver regardless of which driver (Spark , OpenSearch , Prometheus )
Sub-Tasks
This project will be composed of sub-tasks for an incremental and continues process:
Is your feature request related to a problem?
The purpose of this RFC is to consolidate all the different PPL execution engines usage of the PPL specification (ANTLR) and the query to AST construction into a single repository.
This repository will maintain the most up-to-date vocabulary and documentations and will be used as a reference for any downstream engine to use.
A single grammar location enables simpler and consistent way to evolve the language and moves the responsibility of updating the downstream engine on the engine implementing the spec rather than the grammar & language maintainers.
Implemented solution
Our goal is to remove the PPL grammar and AST tree structure from each of the downstream engines and consolidate into a single artifact that will be used by any existing or future physical execution engine.
The single responsibility of the execution engine would be to translate PPL's AST tree into that engine logical or physical plan (in case where that engine has no logical layer such as OpenSearch).
In Spark PPL use case for example, we implemented a CatalystQueryPlanVisitor PPL AST logical plan traverser that will travers the PPL AST tree to transform the PPL logical plan into Catalyst logical plan that will be submitted to Spark to generate the subsequent physical plan and execute the query.
Advantages of the selected approach:
Today each engine has to support its own version of PPL ANTLR and documentation. They tend to diverse one from the other due each engine specification.
Once these components would be extracted into a single location the dependency would be immutable and force the engine implementations to follow the grammar more closely and in the unique cases where divergence is needed - adding a distinct UDF to facilitate the difference in the actual grammar.
Advantages:
The following diagram shows the high level architecture of the selected implementation solution :
The logical Architecture show the next artifacts:
Libraries:
Drivers:
Task :
Sub-Tasks
This project will be composed of sub-tasks for an incremental and continues process:
Do you have any additional context?
ppl on spark
ppl on opensearch
PPL spark ANTLR grammar
PPL OpenSearch ANTLR grammar
ppl spark implementation issue
The text was updated successfully, but these errors were encountered: