Skip to content

Implement spath partial document traversal#4185

Closed
Swiddis wants to merge 22 commits intoopensearch-project:mainfrom
Swiddis:feature/spath-grace
Closed

Implement spath partial document traversal#4185
Swiddis wants to merge 22 commits intoopensearch-project:mainfrom
Swiddis:feature/spath-grace

Conversation

@Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Aug 29, 2025

Description

Permits spath to partially traverse document fields, which helps with parsing some types of nested structures.

Core idea is that spath field=data inner.key should be able to parse {"data": {"inner": "{\"key\": 0}"}}

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis changed the title Add missing license headers from 4120 Implement spath partial document traversal Aug 29, 2025
@Swiddis Swiddis added the enhancement New feature or request label Aug 29, 2025
@Swiddis Swiddis marked this pull request as ready for review August 29, 2025 23:57
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Copy link
Collaborator

@ykmr1224 ykmr1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the description spath field=data inner.key should be able to parse {"data": {"inner": "{\"key\": 0}"}}

Comment on lines +55 to +60
* We want input=outer, path=inner.data to match records like `{ "outer": { "inner": "{\"data\":
* 0}" }}`. To rewrite this as eval, that means we need to detect the longest prefix match in the
* fields (`outer.inner`) and parse `data` out of it. We need to match on segments, so
* `outer.inner` shouldn't match `outer.inner_other`.
*
* @return The field from the RelBuilder with the most overlap, or inField if none exists.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. Is input parameter for specifying where we want to read JSON from? This description looks like we are reading JSON which includes outer attribute inside.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input is the outermost field we want to start extracting the inner values from, so on something like { "outer": { "inner": "{\"data\": 0}" }} then input=outer means that Spath will be processing the document { "inner": "{\"data\": 0}" }. Then path=inner.data would access the value 0, or path=inner would access the value "{\"data\": 0}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean outer is a column? In that case I think we want to separate that from JSON string.

Copy link
Collaborator Author

@Swiddis Swiddis Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think in the future it makes sense to remove the input field and just navigate directly to the specified path. For this change, the intended behavior is to allow this type of mixing so you don't need to worry about where exactly the boundary is. I might cut a future PR to make input be an empty string by default?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is making thing more complicated.
In my opinion, it is simpler and easier to understand if input simply point the column containing JSON, and path specifies the path in JSON.

But I am open if others feels current approach is better.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis requested a review from RyanL1997 as a code owner September 17, 2025 16:39
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
@Swiddis Swiddis added the v3.3.0 label Sep 26, 2025
Swiddis and others added 4 commits September 26, 2025 22:35
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants