Skip to content

[BUG] PPL bin command fails to produce output for nested/struct fields #4482

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=kube-state-metrics | bin `monitoring.metrics.beat.handles.open` bins=5
source=kube-state-metrics | bin `monitoring.metrics.beat.cpu.system.ticks` span=1000

Expected Result:
The bin command should create binned ranges for the nested field values and include them in the output, similar to how it works with flat (non-nested) fields. The binned values should appear in the result set either as a new column or by replacing the original field value.

Actual Result:
When using the bin command on nested/struct fields without an explicit fields projection, the binned values are not included in the output. The query returns the original nested field values unchanged, making it appear as if the bin command is being ignored entirely.

However, when adding | fields <nested.field> after the bin command, the binned values do appear correctly.

Dataset Information

Dataset/Schema Type

  • Custom (details below)

Index Mapping

{
  "mappings": {
    "properties": {
      "monitoring": {
        "properties": {
          "metrics": {
            "properties": {
              "beat": {
                "properties": {
                  "handles": {
                    "properties": {
                      "open": { "type": "long" }
                    }
                  },
                  "cpu": {
                    "properties": {
                      "system": {
                        "properties": {
                          "ticks": { "type": "long" }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Sample Data

{
  "monitoring": {
    "metrics": {
      "beat": {
        "handles": {
          "open": 66
        },
        "cpu": {
          "system": {
            "ticks": 11403
          }
        }
      }
    }
  }
}

Bug Description

Issue Summary:
The PPL bin command fails to produce binned output for nested/struct fields when used without an explicit fields projection. The binned expression is not properly integrated into the query result, causing the original field values to be returned instead of the binned ranges.

Steps to Reproduce:

  1. Create an index with nested field structure (e.g., nested.value where nested is an object with a value field)
  2. Insert documents with numeric values in the nested field
  3. Run a PPL query: source=<index> | bin nested.value span=10
  4. Observe that the output contains the original nested.value values, not binned ranges
  5. Compare with flat field: source=<index> | bin flat_value span=10 - this works correctly and shows binned ranges

Comparison:

Flat field (WORKS):

source=test-bin-nested | bin flat_value span=10

Result includes binned column: ["0-10", "10-20", "20-30", ...]

Nested field (BROKEN):

source=test-bin-nested | bin nested.value span=10

Result shows original values: [5, 15, 25, ...] - no binned output

Nested field with workaround (WORKS):

source=test-bin-nested | bin nested.value span=10 | fields nested.value

Result shows binned values: ["0-10", "10-20", "20-30", ...]

Impact:
This bug affects users working with structured datasets that use nested field mappings. The bin command is essential for creating histograms and time-series visualizations, and this limitation forces users to either:

  • Flatten their data structure (losing semantic organization)
  • Use the workaround of explicitly projecting fields (which is non-intuitive and undocumented)
  • Avoid using bin with nested fields entirely

Environment Information

OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT (reproduced on development build)

Additional Details:
The issue was confirmed through execution plan analysis using POST _plugins/_ppl/_explain:

  • For flat fields: The logical plan correctly includes SPAN_BUCKET($field, span) function
  • For nested fields: The logical plan is missing the SPAN_BUCKET function entirely, showing only the original field projection

Tentative Root Cause Analysis

This is a preliminary analysis and requires further investigation.

The root cause appears to be in the projectPlusOverriding method in CalciteRelNodeVisitor.java (lines 844-867). This method is responsible for adding the binned expression to the query plan while handling field name conflicts.

The issue occurs at lines 847-851:

List<String> originalFieldNames = context.relBuilder.peek().getRowType().getFieldNames();
List<RexNode> toOverrideList =
    originalFieldNames.stream()
        .filter(newNames::contains)
        .map(a -> (RexNode) context.relBuilder.field(a))
        .toList();

For nested fields like nested.value:

  • The originalFieldNames list contains only top-level field names (e.g., ["flat_value", "nested"])
  • The newNames list contains the full nested path (e.g., ["nested.value"])
  • The filter newNames::contains never matches because "nested""nested.value"
  • As a result, toOverrideList is empty, and the binned expression is added as a new field but never properly integrated into the output

This explains why:

  1. The bin command appears to do nothing for nested fields (the binned expression exists but isn't visible)
  2. Using | fields nested.value works (it explicitly projects the binned expression)
  3. Flat fields work correctly (the field name matches exactly)

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.

The fix should modify the projectPlusOverriding method to handle nested field names correctly. One approach:

  1. Extract the top-level field name from nested paths (e.g., "nested.value""nested")
  2. Check if the top-level field exists in originalFieldNames
  3. If it exists, mark it for override to ensure the binned expression replaces the nested field value

Alternatively, the field resolution logic in BinUtils.extractFieldName() could be enhanced to work with Calcite's field access patterns for nested structures.

A more robust solution might involve:

  • Modifying how nested fields are represented in the Calcite plan
  • Ensuring that context.relBuilder.field(a) can properly resolve nested field paths
  • Updating the rename logic to handle dotted field names

The fix should be validated against:

  • Simple nested fields (one level: nested.value)
  • Deeply nested fields (multiple levels: a.b.c.value)
  • Mixed scenarios (some flat, some nested fields in the same query)
  • All bin variants (span, bins, minspan, range)

Workaround

Add an explicit fields projection after the bin command:

source=kube-state-metrics | bin `monitoring.metrics.beat.handles.open` bins=5 | fields `monitoring.metrics.beat.handles.open`

This forces the binned expression to be projected into the output, making the binned ranges visible.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't workingcalcitecalcite migration releated

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions