-
Notifications
You must be signed in to change notification settings - Fork 181
Starter implementation for spath command
#4120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starter implementation for spath command
#4120
Conversation
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
3ff1cbd to
7d978af
Compare
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
spath command
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Swiddis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes for reviewers
| String outField = null; | ||
| String path = null; | ||
|
|
||
| for (OpenSearchPPLParser.SpathParameterContext param : ctx.spathParameter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an argument appears multiple times, we overwrite it with the new value.
This is consistent with how the other commands behave under the same scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even input parameter can be multiple? I think it may be worth adding this comment in code.
| private static String doJsonize(Object candidate) { | ||
| if (isScalarObject(candidate)) { | ||
| if (candidate == null) { | ||
| return "null"; // Matches isScalarObject, but not toString-able. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usability patch. Without this, JSON_EXTRACT raises exceptions on any missing fields (which kills extracting from flexibly-typed inputs, one of the major reasons to rely on string types).
The user-facing error without this on a missing value is java.lang.NullPointerException: Cannot invoke "Object.toString()" because "candidate" is null.
| return AstDSL.eval( | ||
| this.child, | ||
| AstDSL.let( | ||
| AstDSL.field(outField), | ||
| AstDSL.function( | ||
| "json_extract", AstDSL.field(inField), AstDSL.stringLiteral(this.path)))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core behavior, the rest is plumbing/parsing boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It spath is translate to eval, does spath node still required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to be adding more functionality to it later, so I think it's worthwhile to have the full node here. We could probably be more efficient by parsing it directly into an eval at the tree level if it stays as just this step, though.
| +----------+---+ | ||
| | doc | n | | ||
| |----------+---| | ||
| | {"n": 1} | 1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results is string? "1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, returns a string -- we can't really have it dynamically choose the type until we have the ability to dynamically make schemas at planning time
|
|
||
| PPL query:: | ||
|
|
||
| PPL> source=test_spath | spath input=doc output=first_element list{0} | spath input=doc output=all_elements list{} | spath input=doc output=nested nest_out.nest_in; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expectation of conflict output?
###
POST {{baseUrl}}/tttt/_bulk
Content-Type: application/x-ndjson
{ "index": { "_id": 1 } }
{ "@timestamp": "2025-08-25 00:00:01", "id": 1, "nested_out": {"nested_in": "success"} }
{ "index": { "_id": 2 } }
{ "@timestamp": "2025-08-25 00:00:02", "id": 2, "nested_out": {"nested_in": "failed"} }
###
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson
{
"query": "source=tttt | eval doc='{\"list\": [1, 2, 3, 4], \"nest_out\": {\"nest_in\": \"a\"}}' | spath input=doc output=nested_out nest_out.nest_in | fields nested_out.nested_in"
}
###
POST {{baseUrl}}/_plugins/_ppl/
Content-Type: application/x-ndjson
{
"query": "source=tttt | eval doc='{\"list\": [1, 2, 3, 4], \"nest_out\": {\"nest_in\": \"a\"}}' | spath input=doc output=nested_out nest_out.nest_in | fields nested_out"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with eval, it overwrites the existing field. This seems correct to me
| +------------------------------------------------------+---------------+--------------+--------+ | ||
| | doc | first_element | all_elements | nested | | ||
| |------------------------------------------------------+---------------+--------------+--------| | ||
| | {"list": [1, 2, 3, 4], "nest_out": {"nest_in": "a"}} | 1 | [1,2,3,4] | a | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result is string? "[1,2,3,4]"
"null", and "[]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also return null instead of "null", but it seems consistent with the whole "stringify into json" rule that the rest of the json_extract method has going on
| return AstDSL.eval( | ||
| this.child, | ||
| AstDSL.let( | ||
| AstDSL.field(outField), | ||
| AstDSL.function( | ||
| "json_extract", AstDSL.field(inField), AstDSL.stringLiteral(this.path)))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It spath is translate to eval, does spath node still required?
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
RyanL1997
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Swiddis , thanks for taking this on. Generally lgtm, just need to fix the license header for some classes.
| @@ -0,0 +1,62 @@ | |||
| package org.opensearch.sql.ppl.utils; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header.
| @@ -0,0 +1,55 @@ | |||
| package org.opensearch.sql.ast.tree; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header
| executeQuery("source=test_spath | spath input=doc output=result path=n | fields result"); | ||
| verifySchema(result, schema("result", "string")); | ||
| verifyDataRows(result, rows("1"), rows("2"), rows("3")); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add another error case of malformed JSON / invalid paths
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix all license header as Ryan commented.
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4120-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4dc4ba99f3bcabeebc41685fea2684fe5c6db9b1
# Push it to GitHub
git push --set-upstream origin backport/backport-4120-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
* Stub spath entry in grammar Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add basic grammar parsing for SPath Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Implement simple rewrite logic (WIP) Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Fix all the breakage Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Style fixes + repair build Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add some docs Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Anchor to json_extract Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Allow arbitrary argument ordering Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add SQL translation tests Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Empty commit (CI retry) Signed-off-by: Simeon Widdis <sawiddis@amazon.com> --------- Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Are all license header fixed? |
|
It was auto-merged |
|
@Swiddis need to fix the backporting manually. And add the |
* Starter implementation for `spath` command (#4120) * Stub spath entry in grammar Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add basic grammar parsing for SPath Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Implement simple rewrite logic (WIP) Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Fix all the breakage Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Style fixes + repair build Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add some docs Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Anchor to json_extract Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Allow arbitrary argument ordering Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add SQL translation tests Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Empty commit (CI retry) Signed-off-by: Simeon Widdis <sawiddis@amazon.com> --------- Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add missing license headers from 4120 Signed-off-by: Simeon Widdis <sawiddis@amazon.com> --------- Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Description
For now, implementing
spath(#4119) as a lightweight conversion directly into ajson_extracteval. We'll add more specific semantics later, but this covers the most common usage for the command as-is.For more path behavior, see:
sql/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJsonBuiltinFunctionIT.java
Line 107 in b220be4
Related Issues
Related to #4119
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.