[RFC] Improved structured data extraction with `spath`

## Problem Statement
Users would like a simple, consistent way to extract nested data from structured documents (JSON/XML) stored in text fields. While OpenSearch PPL provides JSON utility functions, the syntax can be verbose and there's no built-in support for XML extraction. 

Most structured data should be indexed directly as OpenSearch fields, but this gets more annoying when documents have highly flexible formats, or when users don't want to pay indexing costs for many nested fields.

## Current State
OpenSearch PPL currently supports JSON extraction through functions like `json_extract()`, but:
* The function syntax is somewhat verbose when just needing to extract a single field
* There's no built-in support for XML document parsing
* Users working with both JSON and XML need to use different approaches for each format

## Long-Term Goals
* Provide a unified, intuitive syntax for extracting data from both JSON and XML documents
* Improve the usability of analyzing nested structured data
* Enable consistent handling of structured data regardless of format
* Keep it extensible so we could implement further structured data improvements down the road

## Proposal
Starting with the conventions from SPL's `spath` command, implement an `spath` that provides:
1. Path-based syntax for both JSON and XML extraction
2. Automatic format detection between JSON and XML
3. Output consistent with existing JSON functions

Example usage:
```sql
source=logs 
| spath input=message path=user.id 
| where user.id != null

source=events
| spath input=payload path=root.metadata.version output=version
```

## Approach

1. Implement core `spath` command with (for now) required parameters, based on existing JSON functions:
```
spath input=<field> [output=<field>] [path=]<datapath>
```

2. Add XML support to the existing JSON infrastructure
3. Implement format detection and appropriate parser selection

Down the road, this could be extended based on usage to support features such as dynamic column loading, more input format types, or parsing the results into specified types. Specifically having a built-in cast operation would be a nice enhancement that current JSON extraction doesn't support (we effectively need `eval bytes_temp = json_extract(data, 'bytes'), bytes = cast(bytes_temp as int)`).

## Alternatives
Users can continue using existing JSON functions and implement custom XML parsing logic:
```sql
source=logs 
| eval user_id = json_extract(message, 'user.id')
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Improved structured data extraction with `spath` #4119

Problem Statement

Current State

Long-Term Goals

Proposal

Approach

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Improved structured data extraction with spath #4119

Description

Problem Statement

Current State

Long-Term Goals

Proposal

Approach

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RFC] Improved structured data extraction with `spath` #4119