Skip to content

Conversation

@Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Aug 24, 2025

Description

For now, implementing spath (#4119) as a lightweight conversion directly into a json_extract eval. We'll add more specific semantics later, but this covers the most common usage for the command as-is.

For more path behavior, see:

Related Issues

Related to #4119

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis added the enhancement New feature or request label Aug 25, 2025
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis force-pushed the feature/json-extraction branch from 3ff1cbd to 7d978af Compare August 25, 2025 21:14
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis requested a review from yuancu as a code owner August 25, 2025 21:20
@Swiddis Swiddis changed the title Simple stub for spath command Starter implementation for spath command Aug 25, 2025
@Swiddis Swiddis added PPL Piped processing language backport 2.19-dev labels Aug 25, 2025
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Copy link
Collaborator Author

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes for reviewers

String outField = null;
String path = null;

for (OpenSearchPPLParser.SpathParameterContext param : ctx.spathParameter()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an argument appears multiple times, we overwrite it with the new value.

This is consistent with how the other commands behave under the same scenario.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even input parameter can be multiple? I think it may be worth adding this comment in code.

private static String doJsonize(Object candidate) {
if (isScalarObject(candidate)) {
if (candidate == null) {
return "null"; // Matches isScalarObject, but not toString-able.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usability patch. Without this, JSON_EXTRACT raises exceptions on any missing fields (which kills extracting from flexibly-typed inputs, one of the major reasons to rely on string types).

The user-facing error without this on a missing value is java.lang.NullPointerException: Cannot invoke "Object.toString()" because "candidate" is null.

Comment on lines +48 to +54
return AstDSL.eval(
this.child,
AstDSL.let(
AstDSL.field(outField),
AstDSL.function(
"json_extract", AstDSL.field(inField), AstDSL.stringLiteral(this.path))));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core behavior, the rest is plumbing/parsing boilerplate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It spath is translate to eval, does spath node still required?

Copy link
Collaborator Author

@Swiddis Swiddis Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to be adding more functionality to it later, so I think it's worthwhile to have the full node here. We could probably be more efficient by parsing it directly into an eval at the tree level if it stays as just this step, though.

+----------+---+
| doc | n |
|----------+---|
| {"n": 1} | 1 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results is string? "1"

Copy link
Collaborator Author

@Swiddis Swiddis Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, returns a string -- we can't really have it dynamically choose the type until we have the ability to dynamically make schemas at planning time


PPL query::

PPL> source=test_spath | spath input=doc output=first_element list{0} | spath input=doc output=all_elements list{} | spath input=doc output=nested nest_out.nest_in;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation of conflict output?

###
POST {{baseUrl}}/tttt/_bulk
Content-Type: application/x-ndjson

{ "index": { "_id": 1 } }
{ "@timestamp": "2025-08-25 00:00:01", "id": 1, "nested_out": {"nested_in": "success"} }
{ "index": { "_id": 2 } }
{ "@timestamp": "2025-08-25 00:00:02", "id": 2, "nested_out": {"nested_in": "failed"} }

###
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson

{
  "query": "source=tttt | eval doc='{\"list\": [1, 2, 3, 4], \"nest_out\": {\"nest_in\": \"a\"}}' | spath input=doc output=nested_out nest_out.nest_in | fields nested_out.nested_in"
}


###
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson

{
  "query": "source=tttt | eval doc='{\"list\": [1, 2, 3, 4], \"nest_out\": {\"nest_in\": \"a\"}}' | spath input=doc output=nested_out nest_out.nest_in | fields nested_out"
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with eval, it overwrites the existing field. This seems correct to me

+------------------------------------------------------+---------------+--------------+--------+
| doc | first_element | all_elements | nested |
|------------------------------------------------------+---------------+--------------+--------|
| {"list": [1, 2, 3, 4], "nest_out": {"nest_in": "a"}} | 1 | [1,2,3,4] | a |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is string? "[1,2,3,4]"

"null", and "[]"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return null instead of "null", but it seems consistent with the whole "stringify into json" rule that the rest of the json_extract method has going on

Comment on lines +48 to +54
return AstDSL.eval(
this.child,
AstDSL.let(
AstDSL.field(outField),
AstDSL.function(
"json_extract", AstDSL.field(inField), AstDSL.stringLiteral(this.path))));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It spath is translate to eval, does spath node still required?

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis enabled auto-merge (squash) August 27, 2025 21:26
Copy link
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Swiddis , thanks for taking this on. Generally lgtm, just need to fix the license header for some classes.

@@ -0,0 +1,62 @@
package org.opensearch.sql.ppl.utils;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header.

@@ -0,0 +1,55 @@
package org.opensearch.sql.ast.tree;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header

executeQuery("source=test_spath | spath input=doc output=result path=n | fields result");
verifySchema(result, schema("result", "string"));
verifyDataRows(result, rows("1"), rows("2"), rows("3"));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add another error case of malformed JSON / invalid paths

RyanL1997

This comment was marked as duplicate.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix all license header as Ryan commented.

@Swiddis Swiddis merged commit 4dc4ba9 into opensearch-project:main Aug 28, 2025
23 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4120-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4dc4ba99f3bcabeebc41685fea2684fe5c6db9b1
# Push it to GitHub
git push --set-upstream origin backport/backport-4120-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4120-to-2.19-dev.

Swiddis added a commit to Swiddis/sql that referenced this pull request Aug 28, 2025
* Stub spath entry in grammar

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add basic grammar parsing for SPath

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Implement simple rewrite logic (WIP)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix all the breakage

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Style fixes + repair build

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add some docs

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Anchor to json_extract

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Allow arbitrary argument ordering

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add SQL translation tests

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Empty commit (CI retry)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@dai-chen
Copy link
Collaborator

Please fix all license header as Ryan commented.

@Swiddis Are all license header fixed?

@Swiddis
Copy link
Collaborator Author

Swiddis commented Aug 28, 2025

It was auto-merged

@LantaoJin
Copy link
Member

LantaoJin commented Sep 10, 2025

@Swiddis need to fix the backporting manually. And add the backport-manually label when backporting PR is submitted.

Swiddis added a commit that referenced this pull request Sep 16, 2025
* Starter implementation for `spath` command (#4120)

* Stub spath entry in grammar

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add basic grammar parsing for SPath

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Implement simple rewrite logic (WIP)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix all the breakage

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Style fixes + repair build

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add some docs

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Anchor to json_extract

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Allow arbitrary argument ordering

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add SQL translation tests

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Empty commit (CI retry)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add missing license headers from 4120

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants