-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
Motivation
A large number of OpenSearch workloads are time‑based: logs, metrics, audits, transactional records, compliance archives. The current approach pushes this into time‑based indices (ILM) or leaves it to manual _routing. ILM handles index lifecycle well, but within a single index there is no built‑in way to keep temporal locality at the shard level.
Without temporal routing:
- Hot writes are scattered across all primaries.
- Range queries fan out to every shard.
- Cache hit rates are lower, and query latencies are higher.
The goal is to add a pair of processors that make time‑based shard locality an opt‑in, centrally managed feature. No client changes, no duplicated routing logic.
Describe the solution you'd like
Overview
Two processors:
-
TemporalRoutingProcessor– Ingest pipeline- Reads a date/time field from the document.
- Rounds to a configured granularity (
hour,day,week,month). - Builds a routing key from the rounded value (optionally hashed).
- Sets
_routingif not already set, or overrides if configured.
-
TemporalRoutingSearchProcessor– Search pipeline- Examines the query for a
rangeortermon the configured date/time field. - Determines the bucket(s) matching the query range.
- Sets
SearchRequest.routing()before shard resolution.
- Examines the query for a
Configuration
| Field | Type | Required | Notes |
|---|---|---|---|
timestamp_field |
string | yes | Field containing the date/time value. |
granularity |
string | yes | One of: hour, day, week, month. |
format |
string | no | Date format; defaults to strict_date_optional_time. |
override_existing |
boolean | no | Default true. Overwrites existing routing. |
hash_bucket |
boolean | no | Default false. Hashes bucket string before assigning _routing. |
Example – Ingest
PUT _ingest/pipeline/temporal-routing
{
"processors": [
{
"temporal_routing": {
"timestamp_field": "created_at",
"granularity": "month"
}
}
]
}A document:
{
"created_at": "2025-07-29T10:45:00Z"
}→ _routing = "2025-07"
Example – Search
PUT _search/pipeline/temporal-routing-search
{
"processors": [
{
"temporal_routing_search": {
"timestamp_field": "created_at",
"granularity": "month",
"source": "query"
}
}
]
}Query:
{
"range": {
"created_at": {
"gte": "2025-07-01",
"lt": "2025-07-31"
}
}
}→ Routed to "2025-07" bucket only.
Implementation Details
Ingest Side – TemporalRoutingProcessor
- Package:
org.opensearch.ingest.common - Extends:
AbstractProcessor - Logic:
- Extract field value. Fail or skip if missing (based on config).
- Parse with
DateFormatter. - Truncate to configured granularity:
hour: yyyy-MM-dd-HHday: yyyy-MM-ddweek: yyyy-'W'wwmonth: yyyy-MM
- Optionally hash bucket string if
hash_bucket=true. - Set
_routing.
Search Side – TemporalRoutingSearchProcessor
- Package:
org.opensearch.search.pipeline - Implements:
SearchRequestProcessor - Logic:
- If
source=query, walk parsed query tree to findtermorrangeon target field. - Determine matching bucket(s). If range spans multiple buckets, collect them all.
- Build routing key(s), hash if configured.
- Set
SearchRequest.routing().
- If
Registration
Ingest
@Override
public Map<String, Processor.Factory> getProcessors(Processor.Parameters parameters) {
processors.put("temporal_routing", new TemporalRoutingProcessor.Factory());
}Search
@Override
public Map<String, SearchRequestProcessor.Factory> getRequestProcessors(Processor.Parameters parameters) {
processors.put("temporal_routing_search", new TemporalRoutingSearchProcessor.Factory());
}Testing
Unit
- All granularities.
- Missing field.
- Invalid date formats.
override_existing=false.- Hash vs non‑hash outputs.
Integration
- Index docs via pipeline, verify
_routing. - Search via pipeline, verify targeted shards.
- Multi‑bucket ranges.
Notes
- This does not replace ILM. ILM is for index‑level lifecycle, this is shard‑level locality inside a single index.
- Can be combined with ACL or hierarchical routing via a composite processor.
- Processing cost is minimal: one parse, one truncate, optional hash.
Related component
No response
Describe alternatives you've considered
No response
Additional context
No response
Co authored by @abhishekpsingh