generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 181
Closed
Labels
PPLPiped processing languagePiped processing languageRFCRequest For CommentsRequest For Commentscalcitecalcite migration releatedcalcite migration releatedv3.4.0
Description
Problem Statement
Users need a first-class way to execute multiple independent subsearches and view the combined results in strict time order. Achieving this today requires manual composition (e.g., append + explicit sort) and careful command selection, which is error-prone, harder to optimize, and easy to
misuse (e.g., mixing non-streaming operations that stall pipelines).
Current State
- PPL provides
append, which concatenates results sequentially in the order subsearches are written. appendcommand can only support one single subsearch- There is no built-in interleaving by timestamp; users must add a
sortas a separate step.
Short-Term Goal
Current multisearch behavior:
- Uses UNION ALL + ORDER BY @timestamp DESC approach for result combination
- Supports all PPL commands in subsearches (no artificial streaming restrictions)
- Provides timestamp-based interleaving when @timestamp field is available
- Falls back to sequential concatenation when timestamp field is missing
Long-Term Goals
- Provide a first-class generating command that:
- Runs two or more subsearches and globally interleaves results by timestamp (DESC).
- Implements concurrent subsearch execution similar to SPL's streaming architecture for improved performance.
- Provides real-time result merging capabilities for live data scenarios.
Current Implementation
The multisearch command is now available with the following syntax:
source=<index> | multisearch [ <subsearch> ] [ <subsearch> ] ... [ <subsearch> ]
Key semantics:
- Position: Can be used after a source command (not necessarily first in pipeline).
- Cardinality: Requires ≥ 2 subsearches.
- Command support: All PPL commands are now supported in subsearches.
- Result combination: UNION ALL of all subsearch outputs, followed by ORDER BY @timestamp DESC.
- Time field: Uses @timestamp by default for ordering results.
Examples
-- Age group analysis with stats (now supported)
source=accounts | multisearch
[source=accounts | where age < 30 | eval age_group = 'young']
[source=accounts | where age >= 30 | eval age_group = 'adult']
| stats count by age_group
-- Time interleaving across two indices
source=logs | multisearch
[source=service_logs | where category IN ('A', 'B')]
[source=metrics_logs | where category IN ('E', 'F')]
| head 5
-- Complex aggregations in subsearches (now supported)
source=data | multisearch
[source=sales | stats sum(revenue) by region | eval type = 'sales']
[source=costs | stats sum(expense) by region | eval type = 'costs']
| sort region
Next Steps for Long-Term Implementation
- Concurrent Execution Architecture: Investigate implementing SPL-style concurrent subsearch execution using Calcite's parallel query capabilities.
- Real-time Merging: Explore streaming result merging for live data scenarios where results arrive continuously.
- Performance Optimization: Benchmark current UNION ALL + SORT approach vs potential concurrent execution benefits.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languageRFCRequest For CommentsRequest For Commentscalcitecalcite migration releatedcalcite migration releatedv3.4.0
Type
Projects
Status
New
Status
Done
Status
Done