[RFC]: Temporal Routing Processors

### Is your feature request related to a problem? Please describe

## Motivation

A large number of OpenSearch workloads are time‑based: logs, metrics, audits, transactional records, compliance archives. The current approach pushes this into time‑based indices (ILM) or leaves it to manual `_routing`. ILM handles index lifecycle well, but within a single index there is no built‑in way to keep temporal locality at the shard level.

Without temporal routing:
- Hot writes are scattered across all primaries.
- Range queries fan out to every shard.
- Cache hit rates are lower, and query latencies are higher.

The goal is to add a pair of processors that make time‑based shard locality an opt‑in, centrally managed feature. No client changes, no duplicated routing logic.

### Describe the solution you'd like

## Overview

Two processors:

1. **`TemporalRoutingProcessor`** – Ingest pipeline
   - Reads a date/time field from the document.
   - Rounds to a configured granularity (`hour`, `day`, `week`, `month`).
   - Builds a routing key from the rounded value (optionally hashed).
   - Sets `_routing` if not already set, or overrides if configured.

2. **`TemporalRoutingSearchProcessor`** – Search pipeline
   - Examines the query for a `range` or `term` on the configured date/time field.
   - Determines the bucket(s) matching the query range.
   - Sets `SearchRequest.routing()` before shard resolution.

---

## Configuration

| Field              | Type     | Required | Notes |
|--------------------|----------|----------|-------|
| `timestamp_field`  | string   | yes      | Field containing the date/time value. |
| `granularity`      | string   | yes      | One of: `hour`, `day`, `week`, `month`. |
| `format`           | string   | no       | Date format; defaults to `strict_date_optional_time`. |
| `override_existing`| boolean  | no       | Default `true`. Overwrites existing routing. |
| `hash_bucket`      | boolean  | no       | Default `false`. Hashes bucket string before assigning `_routing`. |

---

## Example – Ingest

```json
PUT _ingest/pipeline/temporal-routing
{
  "processors": [
    {
      "temporal_routing": {
        "timestamp_field": "created_at",
        "granularity": "month"
      }
    }
  ]
}
```

A document:

```json
{
  "created_at": "2025-07-29T10:45:00Z"
}
```

→ `_routing = "2025-07"`

---

## Example – Search

```json
PUT _search/pipeline/temporal-routing-search
{
  "processors": [
    {
      "temporal_routing_search": {
        "timestamp_field": "created_at",
        "granularity": "month",
        "source": "query"
      }
    }
  ]
}
```

Query:

```json
{
  "range": {
    "created_at": {
      "gte": "2025-07-01",
      "lt": "2025-07-31"
    }
  }
}
```

→ Routed to `"2025-07"` bucket only.

---

## Implementation Details

**Ingest Side – `TemporalRoutingProcessor`**
- **Package:** `org.opensearch.ingest.common`
- **Extends:** `AbstractProcessor`
- **Logic:**
  1. Extract field value. Fail or skip if missing (based on config).
  2. Parse with `DateFormatter`.
  3. Truncate to configured granularity:
     - `hour`: yyyy-MM-dd-HH
     - `day`: yyyy-MM-dd
     - `week`: yyyy-'W'ww
     - `month`: yyyy-MM
  4. Optionally hash bucket string if `hash_bucket=true`.
  5. Set `_routing`.

**Search Side – `TemporalRoutingSearchProcessor`**
- **Package:** `org.opensearch.search.pipeline`
- **Implements:** `SearchRequestProcessor`
- **Logic:**
  1. If `source=query`, walk parsed query tree to find `term` or `range` on target field.
  2. Determine matching bucket(s). If range spans multiple buckets, collect them all.
  3. Build routing key(s), hash if configured.
  4. Set `SearchRequest.routing()`.

---

## Registration

**Ingest**
```java
@Override
public Map<String, Processor.Factory> getProcessors(Processor.Parameters parameters) {
    processors.put("temporal_routing", new TemporalRoutingProcessor.Factory());
}
```

**Search**
```java
@Override
public Map<String, SearchRequestProcessor.Factory> getRequestProcessors(Processor.Parameters parameters) {
    processors.put("temporal_routing_search", new TemporalRoutingSearchProcessor.Factory());
}
```

---

## Testing

**Unit**
- All granularities.
- Missing field.
- Invalid date formats.
- `override_existing=false`.
- Hash vs non‑hash outputs.

**Integration**
- Index docs via pipeline, verify `_routing`.
- Search via pipeline, verify targeted shards.
- Multi‑bucket ranges.

---

## Notes

- This does not replace ILM. ILM is for index‑level lifecycle, this is shard‑level locality inside a single index.
- Can be combined with ACL or hierarchical routing via a composite processor.
- Processing cost is minimal: one parse, one truncate, optional hash.


### Related component

_No response_

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Co authored by @abhishekpsingh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Temporal Routing Processors #18920

Is your feature request related to a problem? Please describe

Motivation

Describe the solution you'd like

Overview

Configuration

Example – Ingest

Example – Search

Implementation Details

Registration

Testing

Notes

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Type	Required	Notes
`timestamp_field`	string	yes	Field containing the date/time value.
`granularity`	string	yes	One of: `hour`, `day`, `week`, `month`.
`format`	string	no	Date format; defaults to `strict_date_optional_time`.
`override_existing`	boolean	no	Default `true`. Overwrites existing routing.
`hash_bucket`	boolean	no	Default `false`. Hashes bucket string before assigning `_routing`.

[RFC]: Temporal Routing Processors #18920

Description

Is your feature request related to a problem? Please describe

Motivation

Describe the solution you'd like

Overview

Configuration

Example – Ingest

Example – Search

Implementation Details

Registration

Testing

Notes

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions