Skip to content

[RFC]: Temporal Routing Processors #18920

@atris

Description

@atris

Is your feature request related to a problem? Please describe

Motivation

A large number of OpenSearch workloads are time‑based: logs, metrics, audits, transactional records, compliance archives. The current approach pushes this into time‑based indices (ILM) or leaves it to manual _routing. ILM handles index lifecycle well, but within a single index there is no built‑in way to keep temporal locality at the shard level.

Without temporal routing:

  • Hot writes are scattered across all primaries.
  • Range queries fan out to every shard.
  • Cache hit rates are lower, and query latencies are higher.

The goal is to add a pair of processors that make time‑based shard locality an opt‑in, centrally managed feature. No client changes, no duplicated routing logic.

Describe the solution you'd like

Overview

Two processors:

  1. TemporalRoutingProcessor – Ingest pipeline

    • Reads a date/time field from the document.
    • Rounds to a configured granularity (hour, day, week, month).
    • Builds a routing key from the rounded value (optionally hashed).
    • Sets _routing if not already set, or overrides if configured.
  2. TemporalRoutingSearchProcessor – Search pipeline

    • Examines the query for a range or term on the configured date/time field.
    • Determines the bucket(s) matching the query range.
    • Sets SearchRequest.routing() before shard resolution.

Configuration

Field Type Required Notes
timestamp_field string yes Field containing the date/time value.
granularity string yes One of: hour, day, week, month.
format string no Date format; defaults to strict_date_optional_time.
override_existing boolean no Default true. Overwrites existing routing.
hash_bucket boolean no Default false. Hashes bucket string before assigning _routing.

Example – Ingest

PUT _ingest/pipeline/temporal-routing
{
  "processors": [
    {
      "temporal_routing": {
        "timestamp_field": "created_at",
        "granularity": "month"
      }
    }
  ]
}

A document:

{
  "created_at": "2025-07-29T10:45:00Z"
}

_routing = "2025-07"


Example – Search

PUT _search/pipeline/temporal-routing-search
{
  "processors": [
    {
      "temporal_routing_search": {
        "timestamp_field": "created_at",
        "granularity": "month",
        "source": "query"
      }
    }
  ]
}

Query:

{
  "range": {
    "created_at": {
      "gte": "2025-07-01",
      "lt": "2025-07-31"
    }
  }
}

→ Routed to "2025-07" bucket only.


Implementation Details

Ingest Side – TemporalRoutingProcessor

  • Package: org.opensearch.ingest.common
  • Extends: AbstractProcessor
  • Logic:
    1. Extract field value. Fail or skip if missing (based on config).
    2. Parse with DateFormatter.
    3. Truncate to configured granularity:
      • hour: yyyy-MM-dd-HH
      • day: yyyy-MM-dd
      • week: yyyy-'W'ww
      • month: yyyy-MM
    4. Optionally hash bucket string if hash_bucket=true.
    5. Set _routing.

Search Side – TemporalRoutingSearchProcessor

  • Package: org.opensearch.search.pipeline
  • Implements: SearchRequestProcessor
  • Logic:
    1. If source=query, walk parsed query tree to find term or range on target field.
    2. Determine matching bucket(s). If range spans multiple buckets, collect them all.
    3. Build routing key(s), hash if configured.
    4. Set SearchRequest.routing().

Registration

Ingest

@Override
public Map<String, Processor.Factory> getProcessors(Processor.Parameters parameters) {
    processors.put("temporal_routing", new TemporalRoutingProcessor.Factory());
}

Search

@Override
public Map<String, SearchRequestProcessor.Factory> getRequestProcessors(Processor.Parameters parameters) {
    processors.put("temporal_routing_search", new TemporalRoutingSearchProcessor.Factory());
}

Testing

Unit

  • All granularities.
  • Missing field.
  • Invalid date formats.
  • override_existing=false.
  • Hash vs non‑hash outputs.

Integration

  • Index docs via pipeline, verify _routing.
  • Search via pipeline, verify targeted shards.
  • Multi‑bucket ranges.

Notes

  • This does not replace ILM. ILM is for index‑level lifecycle, this is shard‑level locality inside a single index.
  • Can be combined with ACL or hierarchical routing via a composite processor.
  • Processing cost is minimal: one parse, one truncate, optional hash.

Related component

No response

Describe alternatives you've considered

No response

Additional context

No response

Co authored by @abhishekpsingh

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions