-
Notifications
You must be signed in to change notification settings - Fork 181
Support spath with dynamic fields #5058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support spath with dynamic fields #5058
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds end-to-end dynamic-fields (auto extraction): SPATH/field-resolution updates, Calcite planning/merge changes to emit a runtime Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as User Query
participant FRV as FieldResolutionVisitor
participant CRV as CalciteRelNodeVisitor
participant JoinUtil as JoinAndLookupUtils
participant Exec as OpenSearchExecutionEngine
participant RP as DynamicFieldsResultProcessor
Client->>FRV: parse & analyze (detect ALL_FIELDS / wildcards)
FRV->>CRV: emit Calcite plan with field requirements and `_MAP` markers
CRV->>JoinUtil: on join -> adjust inputs and request dynamic-map merge
JoinUtil-->>CRV: merged RelNode with `_MAP` (order per OverwriteMode)
CRV->>Exec: execute physical plan -> QueryResponse (may include `_MAP`)
Exec->>RP: expandDynamicFields(QueryResponse)
RP-->>Client: return expanded QueryResponse (concrete STRING columns)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
d1a62c4 to
3defb88
Compare
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
3defb88 to
2e50f6d
Compare
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 795-805: getFieldsAsMap builds values from field refs but passes
them unchanged into makeStringArray(ARRAY(VARCHAR)), risking type mismatch; map
each key's field reference through the existing castToString helper before
collecting into values (i.e., replace context.relBuilder.field(key) with
castToString(context.relBuilder.field(key), context)), then call makeStringArray
on that list. Also handle the empty-keys case: if keys.isEmpty() produce/return
a MAP_FROM_ARRAYS call using empty string literal array and empty string array
(via getStringLiteralArray and makeStringArray on an empty list) or otherwise
guard callers (e.g., adjustFieldsForDynamicFields) so getFieldsAsMap is not
invoked with no keys. Ensure references: getFieldsAsMap, castToString,
makeStringArray, getStringLiteralArray, BuiltinFunctionName.MAP_FROM_ARRAYS, and
adjustFieldsForDynamicFields.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
| for (int i = 0; i < fieldNames.size(); i++) { | ||
| String fieldName = fieldNames.get(i); | ||
| RexNode field = fields.get(i); | ||
| if (fieldName.startsWith(DYNAMIC_FIELDS_MAP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- change startsWith to equals?
- document _MAP is reserved keyword of PPL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 -> It needs to be startsWith since Calcite automatically rename field to _MAP0 when both input contains _MAP.
2 -> Added doc description in identifiers.md
| } | ||
|
|
||
| // 3. Add _MAP field for dynamic fields when wildcards present | ||
| if (resolutionResult.hasWildcards()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delay the wildcard handling in DynamicFieldsResultProcessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed to put everything else (fields not explicitly referred in following commands) into dynamic fields. By aligning the fields in static fields and dynamic fields, we can avoid complicated process to handle both static & dynamic fields in most of the commands. And join can simply append two maps since required fields are always in static fields, and others are in dynamic fields (static/dynamic fields handle different set of fields, and don't require cross check or dedupe between static/dynamic fields)
The logic here:
0. (before here) required static fields are decided by FieldResolutionVisitor
- Make
_MAPwith non-required fields (exclude required fields from the map made from JSON - Add existing non-required fields to the
_MAP - Merge
_MAPwith existing_MAP
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 838-850: getStringLiteralArray must defensively handle an empty
keys collection to avoid creating an ARRAY_VALUE_CONSTRUCTOR with zero operands;
modify getStringLiteralArray to check if keys.isEmpty() and in that case return
an RexNode representing an empty string-array (i.e., an empty-array literal /
appropriately typed empty RexNode using context.rexBuilder and
getStringArrayType(context)) instead of building the stream, and keep the
existing behavior for non-empty keys; this prevents failures when
DynamicFieldsHelper.adjustFieldsForDynamicFields or getFieldsAsMap pass an empty
set (createDynamicMapField already guards but this path does not).
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
794-804: Cast field references to VARCHAR for type consistency.The
valuesarray contains field references that may be non-string types (integer, boolean, etc.), butmakeStringArrayconstructs anARRAY(VARCHAR). The existinggetItemAsStringhelper (lines 810-816) explicitly casts values to VARCHAR for type consistency. Apply the same approach here usingcastToString.🔧 Suggested fix
List<RexNode> values = - keys.stream().map(key -> context.relBuilder.field(key)).collect(Collectors.toList()); + keys.stream() + .map(key -> castToString(context.relBuilder.field(key), context)) + .collect(Collectors.toList()); RexNode valuesArray = makeStringArray(values, context);
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java (2)
56-63: Mutating the list returned byexcludeMetaFieldsis fragile.
getRemainingFieldscallsexcludeMetaFieldsand then modifies the result withremoveAll. WhileCollectors.toList()currently returns a mutable list, this is an implementation detail that could change, and the pattern is error-prone. Consider creating an explicit mutable copy.♻️ Suggested improvement
static List<String> getRemainingFields( Collection<String> existingFields, Collection<String> excluded) { - List<String> keys = excludeMetaFields(existingFields); + List<String> keys = new ArrayList<>(excludeMetaFields(existingFields)); keys.removeAll(excluded); Collections.sort(keys); return keys; }
85-90: UseifPresentinstead ofmapfor side-effect-only operations.
Optional.map()is designed for transformations, not side effects. UsingifPresent()is more idiomatic when you only want to perform an action if a value is present.♻️ Suggested improvement
context.relBuilder.push(left); // `as(alias)` is needed since `build()` won't preserve alias - leftAlias.map(alias -> context.relBuilder.as(alias)); + leftAlias.ifPresent(alias -> context.relBuilder.as(alias)); context.relBuilder.push(right); - rightAlias.map(alias -> context.relBuilder.as(alias)); + rightAlias.ifPresent(alias -> context.relBuilder.as(alias));core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
786-824: Consolidate duplicated helper methods withDynamicFieldsHelper.Several methods here are duplicated in
DynamicFieldsHelper:
getRemainingFields(lines 786-792 vs DynamicFieldsHelper:57-63)excludeMetaFields(lines 806-808 vs DynamicFieldsHelper:52-54)isDynamicFieldsExists(lines 822-824 vs DynamicFieldsHelper:66-68)getItemAsString(lines 810-816 vs DynamicFieldsHelper:127-132)castToString(lines 818-820 vs DynamicFieldsHelper:135-137)Consider delegating to
DynamicFieldsHelperor consolidating these methods in one location to maintain DRY principle. Note that line 121 inDynamicFieldsHelperalready callsCalciteRelNodeVisitor.getFieldsAsMap, indicating the coupling already exists.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1314-1441: mergeDynamicFieldsAsNeeded correctly handles Calcite's duplicate-field renaming using startsWith().The implementation (JoinAndLookupUtils.java:133) uses
fieldName.startsWith(DYNAMIC_FIELDS_MAP)whereDYNAMIC_FIELDS_MAP = "_MAP", so it properly detects both_MAPand Calcite-renamed fields like_MAP0. The merging logic (lines 142-153) correctly concatenates exactly two maps or throws an exception if more exist. Right-side dynamic fields should not be dropped by this logic.However, add test coverage for joins where both inputs contain dynamic fields to verify the end-to-end behavior works as intended. Current test suites (CalcitePPLJoinTest, etc.) do not include scenarios testing
_MAPmerge during joins.
core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionResult.java
Show resolved
Hide resolved
| throw new IllegalArgumentException( | ||
| "AppendCol command cannot be used together with spath command"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This visitor is not generic and only for spath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is specific for spath command since we currently use it only for spath command (I made it specific for user to understand easier). Once we generalize it for schema-on-read, we need to change the message to be generic.
| return getReplacementPairs().stream().map(Pair::getLeft).toList(); | ||
| } | ||
|
|
||
| public boolean isAgainstAllFields() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear when we should call this? Any PPL command doesn't allow *?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When fillnull command is used without field list, it applies to all the fields. This method is used for dynamic fields to raise error since we cannot support it.
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionResult.java (1)
73-123: Add full JavaDoc tags for new public APIs.
Public methods incore/src/main/javashould include@param/@returntags. Please expand the JavaDoc for the newly added public methods (and constructors if needed). As per coding guidelines, please add the missing tags.📝 Example pattern
- /** Checks if result contains any wildcard patterns. */ + /** + * Checks if result contains any wildcard patterns. + * + * `@return` true when any wildcard is present + */ public boolean hasWildcards() { return wildcard != NULL_WILDCARD; }- /** Creates new result excluding specified fields. */ + /** + * Creates new result excluding specified fields. + * + * `@param` fields fields to remove + * `@return` new result without the specified fields + */ public FieldResolutionResult exclude(Collection<String> fields) { ... }
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java`:
- Around line 53-72: getRemainingFields (and similarly getFieldsAsMap) currently
only removes metadata fields, allowing the dynamic map token (isDynamicFieldMap,
e.g. "_MAP") to remain and be treated as a regular key; update
getRemainingFields to call excludeDynamicFields (or otherwise filter out
isDynamicFieldMap) after excludeMetaFields so the dynamic map is removed before
removeAll/excluded and sorting, and apply the same filtering in getFieldsAsMap
(and the other mentioned locations like buildDynamicMapFieldProjection inputs)
to prevent reinserting or double-processing the dynamic map key.
- Around line 25-26: Add unit tests exercising DynamicFieldsHelper's public
static methods: create a test class (e.g., DynamicFieldsHelperTest) that covers
adjustJoinInputsForDynamicFields, buildRegularFieldProjections,
buildDynamicMapFieldProjection and representative map operations; for each
method include positive and edge cases (e.g., inputs with no dynamic fields,
mixed dynamic/regular fields, null/empty map fields) and assert the produced
projections/relocations/expressions match expected Calcite node shapes or
projection lists; use minimal mocked or real Calcite/plan objects consistent
with other core tests to validate join alignment and dynamic map projection
logic, and include cases for nested maps and join input ordering so all public
branches are exercised.
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java (1)
187-218: AvoidindexOfon the filtered list when projecting existing fields.
existingFields.indexOf(existingField)is O(n²) and assumes_MAPis last. Iterating over the full rowType indices is both faster and order-safe.♻️ Proposed refactor
context.relBuilder.push(node); List<RexNode> projectList = new ArrayList<>(); - for (String existingField : existingFields) { - projectList.add(context.rexBuilder.makeInputRef(node, existingFields.indexOf(existingField))); - } + List<String> allFields = node.getRowType().getFieldNames(); + for (int i = 0; i < allFields.size(); i++) { + String fieldName = allFields.get(i); + if (isDynamicFieldMap(fieldName)) { + continue; + } + projectList.add(context.rexBuilder.makeInputRef(node, i)); + }integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathTestBase.java (1)
41-62: Harden JSON escaping for embeddeduserDatastrings.
escape()only handles quotes; backslashes or control characters can still produce invalid JSON payloads. Consider usingJSONObject.quote(...)(or a JSON builder) to ensure correct escaping.♻️ Possible improvement
+import org.json.JSONObject; ... protected String escape(String json) { - return json.replace("\"", "\\\""); + String quoted = JSONObject.quote(json); + return quoted.substring(1, quoted.length() - 1); }
core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java (1)
179-196: Add null check before usinggetInField()in Set.of().Line 191 uses
Set.of(node.getInField())without null checking. Since SPath uses@AllArgsConstructor,inFieldcan be null (as evidenced by the null check atPPLQueryDataAnonymizer:775), which would causeNullPointerExceptionat runtime. Add a guard condition before this line or use a null-safe approach.
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep spath command at experiment status before we solved limitations.
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one concern: I feel it's unclear what future PPL command developers need to understand or implement when adding commands that interact with dynamic fields, since current _MAP implementation seems leaks across multiple layers (field resolution, RelNode visitor, join handling, result post-processing, etc). We can consider improving document or consolidating the logic later if this is valid concern. Thanks!
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-5058-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 633d760a5d0eb5a516232037ef79cb1cdf3af75c
# Push it to GitHub
git push --set-upstream origin backport/backport-5058-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
* Support spath with dynamic fields Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add explain test and sql conversion test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix FieldResolutionResultTest Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix explain Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join logic to adopt dynamic fields Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join logic Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix spath.md Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Support fillnull and replace Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Update spath.md Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix doc and error for fillnull Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Accept wildcard only at the end of field list Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Adopt append command to spath Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join to allow spath in only one input Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix unit test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join inputs logic Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Move helper methods Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add _MAP description in the docs Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Extract more from CalciteRelNodeVisitor Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Refactor IT and address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix FieldResolutionResult Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Refactor DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> (cherry picked from commit 633d760) Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
* Support spath with dynamic fields Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add explain test and sql conversion test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix FieldResolutionResultTest Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix explain Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join logic to adopt dynamic fields Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join logic Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix spath.md Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Support fillnull and replace Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Update spath.md Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix doc and error for fillnull Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Accept wildcard only at the end of field list Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Adopt append command to spath Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join to allow spath in only one input Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix unit test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix join inputs logic Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Move helper methods Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add _MAP description in the docs Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Extract more from CalciteRelNodeVisitor Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Refactor IT and address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix FieldResolutionResult Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Refactor DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DynamicFieldsHelper Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Description
_MAP(dynamic fields), and expanded when result is organized.prefix*,*suffix, and raise error in case those are used.fieldscommand specify*in the middle of field list.appendcol,multisearch,lookupRelated Issues
#4984
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.