Skip to content

[FEATURE] Export PPL-Calcite engine as reusable library #3734

@dai-chen

Description

@dai-chen

Is your feature request related to a problem?

Currently, the Calcite-based PPL engine lives within the OpenSearch SQL/PPL plugin and cannot be reused outside of that context. This limits its applicability in other environments, where the same parsing and logical planning capabilities would be valuable.

Use cases:

  1. [FEATURE] PPL Unification in Spark opensearch-spark#1136
  2. [FEATURE] Standalone PPL streams & library  #627
  3. PPL CLI

What solution would you like?

Granularity: Modular Publishing vs. Fat JAR

We considered two approaches for packaging and publishing the Calcite-based PPL engine for external use:

  • Option A: Modular Publishing

    • Description: Publish each internal module (ppl, core, opensearch, etc.) as an independent Maven artifact.
    • Pros: Enables better reuse and flexibility—consumers (e.g., Spark) can depend only on the components they need.
    • Cons: Requires publishing all relevant internal modules.
  • Option B: Fat JAR

    • Description: Bundle all internal modules and dependencies into a single artifact.
    • Pros: Simplifies consumption (e.g., by the PPL CLI) with a single plug-and-play artifact.
    • Cons: Tightly couples all components, increases artifact size, and reduces modularity.
Image

The New API Module

We propose introducing a new api module as a high-level integration layer for the Calcite-based engine and as primary entry point for external consumers.

  • Expose a unified API: The current Calcite engine is tightly coupled with OpenSearch internals (e.g., DataSourceService, Settings), making reuse difficult. The new module will provide a clean, reusable interface for external consumers without exposing low-level implementation details.

  • Guaranteed Interoperability: Integration tests will validate the contract between the Calcite engine and external consumers (e.g., Spark, CLI), ensuring correctness and long-term compatibility.

Tasks:

What alternatives have you considered?

  • Using the full SQL/PPL plugin artifact directly: One option is to consume the existing OpenSearch SQL/PPL plugin artifact as-is. However, this approach brings in many unrelated dependencies—such as REST handlers, transport layers, and plugin wiring—which significantly increase the artifact’s size and complexity.

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagecalcitecalcite migration releatedenhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions