-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Query Information
PPL Command/Query:
source=kube-state-metrics | bin `monitoring.metrics.beat.handles.open` bins=5
source=kube-state-metrics | bin `monitoring.metrics.beat.cpu.system.ticks` span=1000
Expected Result:
The bin command should create binned ranges for the nested field values and include them in the output, similar to how it works with flat (non-nested) fields. The binned values should appear in the result set either as a new column or by replacing the original field value.
Actual Result:
When using the bin command on nested/struct fields without an explicit fields projection, the binned values are not included in the output. The query returns the original nested field values unchanged, making it appear as if the bin command is being ignored entirely.
However, when adding | fields <nested.field> after the bin command, the binned values do appear correctly.
Dataset Information
Dataset/Schema Type
- Custom (details below)
Index Mapping
{
"mappings": {
"properties": {
"monitoring": {
"properties": {
"metrics": {
"properties": {
"beat": {
"properties": {
"handles": {
"properties": {
"open": { "type": "long" }
}
},
"cpu": {
"properties": {
"system": {
"properties": {
"ticks": { "type": "long" }
}
}
}
}
}
}
}
}
}
}
}
}
}Sample Data
{
"monitoring": {
"metrics": {
"beat": {
"handles": {
"open": 66
},
"cpu": {
"system": {
"ticks": 11403
}
}
}
}
}
}Bug Description
Issue Summary:
The PPL bin command fails to produce binned output for nested/struct fields when used without an explicit fields projection. The binned expression is not properly integrated into the query result, causing the original field values to be returned instead of the binned ranges.
Steps to Reproduce:
- Create an index with nested field structure (e.g.,
nested.valuewherenestedis an object with avaluefield) - Insert documents with numeric values in the nested field
- Run a PPL query:
source=<index> | bin nested.value span=10 - Observe that the output contains the original
nested.valuevalues, not binned ranges - Compare with flat field:
source=<index> | bin flat_value span=10- this works correctly and shows binned ranges
Comparison:
Flat field (WORKS):
source=test-bin-nested | bin flat_value span=10
Result includes binned column: ["0-10", "10-20", "20-30", ...]
Nested field (BROKEN):
source=test-bin-nested | bin nested.value span=10
Result shows original values: [5, 15, 25, ...] - no binned output
Nested field with workaround (WORKS):
source=test-bin-nested | bin nested.value span=10 | fields nested.value
Result shows binned values: ["0-10", "10-20", "20-30", ...]
Impact:
This bug affects users working with structured datasets that use nested field mappings. The bin command is essential for creating histograms and time-series visualizations, and this limitation forces users to either:
- Flatten their data structure (losing semantic organization)
- Use the workaround of explicitly projecting fields (which is non-intuitive and undocumented)
- Avoid using
binwith nested fields entirely
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT (reproduced on development build)
Additional Details:
The issue was confirmed through execution plan analysis using POST _plugins/_ppl/_explain:
- For flat fields: The logical plan correctly includes
SPAN_BUCKET($field, span)function - For nested fields: The logical plan is missing the
SPAN_BUCKETfunction entirely, showing only the original field projection
Tentative Root Cause Analysis
This is a preliminary analysis and requires further investigation.
The root cause appears to be in the projectPlusOverriding method in CalciteRelNodeVisitor.java (lines 844-867). This method is responsible for adding the binned expression to the query plan while handling field name conflicts.
The issue occurs at lines 847-851:
List<String> originalFieldNames = context.relBuilder.peek().getRowType().getFieldNames();
List<RexNode> toOverrideList =
originalFieldNames.stream()
.filter(newNames::contains)
.map(a -> (RexNode) context.relBuilder.field(a))
.toList();For nested fields like nested.value:
- The
originalFieldNameslist contains only top-level field names (e.g.,["flat_value", "nested"]) - The
newNameslist contains the full nested path (e.g.,["nested.value"]) - The filter
newNames::containsnever matches because"nested"≠"nested.value" - As a result,
toOverrideListis empty, and the binned expression is added as a new field but never properly integrated into the output
This explains why:
- The bin command appears to do nothing for nested fields (the binned expression exists but isn't visible)
- Using
| fields nested.valueworks (it explicitly projects the binned expression) - Flat fields work correctly (the field name matches exactly)
Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
The fix should modify the projectPlusOverriding method to handle nested field names correctly. One approach:
- Extract the top-level field name from nested paths (e.g.,
"nested.value"→"nested") - Check if the top-level field exists in
originalFieldNames - If it exists, mark it for override to ensure the binned expression replaces the nested field value
Alternatively, the field resolution logic in BinUtils.extractFieldName() could be enhanced to work with Calcite's field access patterns for nested structures.
A more robust solution might involve:
- Modifying how nested fields are represented in the Calcite plan
- Ensuring that
context.relBuilder.field(a)can properly resolve nested field paths - Updating the rename logic to handle dotted field names
The fix should be validated against:
- Simple nested fields (one level:
nested.value) - Deeply nested fields (multiple levels:
a.b.c.value) - Mixed scenarios (some flat, some nested fields in the same query)
- All bin variants (span, bins, minspan, range)
Workaround
Add an explicit fields projection after the bin command:
source=kube-state-metrics | bin `monitoring.metrics.beat.handles.open` bins=5 | fields `monitoring.metrics.beat.handles.open`
This forces the binned expression to be projected into the output, making the binned ranges visible.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status