-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Is your feature request related to a problem?
Currently, the Calcite-based PPL engine lives within the OpenSearch SQL/PPL plugin and cannot be reused outside of that context. This limits its applicability in other environments, where the same parsing and logical planning capabilities would be valuable.
Use cases:
- [FEATURE] PPL Unification in Spark opensearch-spark#1136
- [FEATURE] Standalone PPL streams & library #627
- PPL CLI
What solution would you like?
Granularity: Modular Publishing vs. Fat JAR
We considered two approaches for packaging and publishing the Calcite-based PPL engine for external use:
-
Option A: Modular Publishing
- Description: Publish each internal module (
ppl,core,opensearch, etc.) as an independent Maven artifact. - Pros: Enables better reuse and flexibility—consumers (e.g., Spark) can depend only on the components they need.
- Cons: Requires publishing all relevant internal modules.
- Description: Publish each internal module (
-
Option B: Fat JAR
- Description: Bundle all internal modules and dependencies into a single artifact.
- Pros: Simplifies consumption (e.g., by the PPL CLI) with a single plug-and-play artifact.
- Cons: Tightly couples all components, increases artifact size, and reduces modularity.
The New API Module
We propose introducing a new api module as a high-level integration layer for the Calcite-based engine and as primary entry point for external consumers.
-
Expose a unified API: The current Calcite engine is tightly coupled with OpenSearch internals (e.g.,
DataSourceService,Settings), making reuse difficult. The new module will provide a clean, reusable interface for external consumers without exposing low-level implementation details. -
Guaranteed Interoperability: Integration tests will validate the contract between the Calcite engine and external consumers (e.g., Spark, CLI), ensuring correctness and long-term compatibility.
Tasks:
- Publish internal modules separately for downstream reuse #3763
- Add a Gradle task to build a shaded (fat) JAR that bundles all required transitive dependencies for easy external consumption.
- Publish the packaged library to a Maven repository so it can be consumed by Spark or other tools.
- Add unified query API for external integration #3783
- Create a new submodule that acts as a thin abstraction layer over existing Calcite engine internals to isolate and simplify external usage.
- Define unified interface for interacting with the Calcite engine with pluggable schema (OS or Spark catalog) and runtime (local or SparkSQL) support.
- Add integration tests for basic use cases to serve as a contract between the Calcite engine and downstream consumers such as Spark or the CLI.
- [Backport 2.19-dev] Publish internal modules separately for downstream reuse #4723
- Backport to 2.x branch once the Calcite engine backport to 2.x is complete.
What alternatives have you considered?
- Using the full SQL/PPL plugin artifact directly: One option is to consume the existing OpenSearch SQL/PPL plugin artifact as-is. However, this approach brings in many unrelated dependencies—such as REST handlers, transport layers, and plugin wiring—which significantly increase the artifact’s size and complexity.
Do you have any additional context?
- Current Calcite code: https://github.com/opensearch-project/sql/blob/main/core/src/main/java/org/opensearch/sql/executor/QueryService.java#L99C17-L108C29
- Unified API being PoC
- API: https://github.com/dai-chen/sql-1/blob/export-ppl-calcite-library-without-opensearch/unified-engine-api/src/main/java/org/opensearch/sql/unified/api/UnifiedQueryPlanner.java
- Example usage: https://github.com/dai-chen/sql-1/blob/export-ppl-calcite-library-without-opensearch/integ-test/src/test/java/org/opensearch/sql/calcite/standalone/CalciteUnifiedEngineIT.java
Metadata
Metadata
Assignees
Labels
Type
Projects
Status