Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 24, 2025

Description

This PR adds support for the mvmap eval function, which iterates over each element of a multivalue array, applies a given expression, and returns a new array containing the transformed results.

Syntax:
mvmap(array, expression)

During iteration, the field name in the expression is implicitly bound to each element of the input array (e.g., mvmap(arr, arr * 10)).


Implementation Details

Core Approach

mvmap is implemented as an alias for PPL's existing TRANSFORM operator, which applies a lambda function to each array element.

To support implicit binding syntax—mvmap(arr, arr * 10) instead of requiring explicit lambda syntax (mvmap(arr, x -> x * 10))—AST transformation logic was added in AstExpressionBuilder.java.

AST Transformation Steps

  1. Intercept mvmap function calls in visitEvalFunctionCall()
  2. Extract the field name from the first argument (e.g., arr)
  3. Wrap the second argument expression into a LambdaFunction, using the extracted field name as the parameter
  4. Transform:

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ahkcs ahkcs requested a review from dai-chen November 26, 2025 00:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14b0fc5 and 796f836.

📒 Files selected for processing (7)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java (4 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (2 hunks)
  • docs/user/ppl/functions/collection.md (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLParser.g4 (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
  • docs/user/ppl/functions/collection.md
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
🧠 Learnings (3)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
core/src/main/java/org/opensearch/sql/ast/expression/LambdaFunction.java (1)
  • Getter (20-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
🔇 Additional comments (7)
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java (3)

300-312: LGTM for captured variable propagation into lambda parameters.

The logic correctly retrieves captured variables from the context and adds corresponding RexLambdaRef entries to the lambda arguments. This aligns with the capture mechanism in CalcitePlanContext.captureVariable() which stores refs with "__captured_" + index keys.


426-427: Captured variables from only the last lambda are retained.

If a function has multiple lambda arguments (e.g., reduce has accumulator and finish lambdas), capturedVars is reassigned for each lambda iteration, so only the last lambda's captured variables are kept. This is acceptable for mvmap/transform (single lambda), but verify this doesn't affect reduce if captured variables are needed there.


438-445: LGTM for appending captured variables to function arguments.

The captured variables are correctly appended to the argument list only for mvmap and transform functions, maintaining backward compatibility with other functions.

core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (2)

65-73: LGTM for new state fields.

The capturedVariables list and inLambdaContext flag are well-documented and appropriately use Lombok annotations for accessors.


91-102: LGTM for the lambda context cloning constructor.

Sharing relBuilder and rexBuilder while isolating lambda-specific state (rexLambdaRefMap, capturedVariables) is the correct approach. This allows lambda expressions to resolve fields from the parent context while maintaining their own variable bindings.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1)

67-92: LGTM for the refactored resolution with lambda-context handling.

The two-phase approach improves clarity:

  1. Lambda variables take precedence (preventing accidental capture of lambda params)
  2. Regular fields are resolved and captured when in lambda context

This correctly enables external field references in mvmap expressions like mvmap(arr, arr * multiplier) where multiplier comes from the outer scope.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)

76-80: Clarify captured variable handling in the args structure.

The comment documents args structure as [array, lambda, captured1, captured2, ..., returnType] supporting multiple captures, but verify that the implementation at lines 101-106 correctly processes all captured variables if multiple are present. If only the first captured variable (args[2]) is intentionally used, update the comment to reflect this single-capture design; otherwise, ensure the loop logic handles all captured arguments.

@ahkcs ahkcs requested review from dai-chen and ykmr1224 December 11, 2025 00:01
@ahkcs
Copy link
Contributor Author

ahkcs commented Dec 15, 2025

Is it possible to refer other fields in the second parameter?

e.g. ... | fields array, a | eval b = mvmap(array, array * a)

If yes, we might want to add test case for that.

Updated to support singe-value field

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

182-208: Consider minor refactorings for maintainability.

The logic correctly handles variable capture and deduplication. However, a few optional improvements could enhance code quality:

  1. Extract magic string constant: The string "__captured_" is used in three places (lines 187, 200, 205). Consider extracting it to a private static final constant for maintainability.

  2. Cache lambda parameter count: The stream/filter operation at lines 199-200 recalculates the lambda parameter count on each capture. For better performance, consider tracking this value separately or caching it.

  3. Method length: The method is 27 lines, slightly exceeding the 20-line guideline. Consider extracting helper methods (e.g., findExistingCapture, createCapturedLambdaRef) to improve readability.

Example for constant extraction:

+  private static final String CAPTURED_VAR_PREFIX = "__captured_";
+
   public RexLambdaRef captureVariable(RexNode fieldRef, String fieldName) {
     // Check if this variable is already captured
     for (int i = 0; i < capturedVariables.size(); i++) {
       if (capturedVariables.get(i).equals(fieldRef)) {
-        return rexLambdaRefMap.get("__captured_" + i);
+        return rexLambdaRefMap.get(CAPTURED_VAR_PREFIX + i);
       }
     }
     
     int captureIndex = capturedVariables.size();
     capturedVariables.add(fieldRef);
     
     int lambdaParamCount =
         (int)
-            rexLambdaRefMap.keySet().stream().filter(key -> !key.startsWith("__captured_")).count();
+            rexLambdaRefMap.keySet().stream().filter(key -> !key.startsWith(CAPTURED_VAR_PREFIX)).count();
     RexLambdaRef lambdaRef =
         new RexLambdaRef(lambdaParamCount + captureIndex, fieldName, fieldRef.getType());
     
-    rexLambdaRefMap.put("__captured_" + captureIndex, lambdaRef);
+    rexLambdaRefMap.put(CAPTURED_VAR_PREFIX + captureIndex, lambdaRef);
     
     return lambdaRef;
   }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48d3857 and 513c11c.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (3)

65-73: Well-designed lambda capture fields.

The new fields for lambda support are well documented and appropriately scoped. The JavaDoc clearly explains their purpose in capturing external variables for lambda expressions.


87-102: Excellent design for lambda context isolation.

The private constructor properly shares the relBuilder with the parent while maintaining separate lambda variable mappings. This design correctly enables lambda expressions to reference fields from the current row while having independent variable scopes.


133-140: LGTM - clone method properly updated.

The updated clone method correctly uses the new private constructor to create lambda contexts that share the relBuilder while maintaining independent lambda state.

dai-chen
dai-chen previously approved these changes Dec 15, 2025
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a minor comment. Thanks for the changes!

public QualifiedName visitFunction(Function node, Void context) {
List<UnresolvedExpression> funcArgs = node.getFuncArgs();
if (!funcArgs.isEmpty()) {
return funcArgs.get(0).accept(this, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we visit each funcArg and return on first non-null (qualified name found)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the visitFunction method to iterate through all function arguments and return on the first non-null qualified name found

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This will also help answer Tomo's comment #4856 (review) which also ask clarify for the behavior.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	docs/user/ppl/functions/collection.rst
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
#	ppl/src/main/antlr/OpenSearchPPLLexer.g4
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java (1)

490-519: Consider documenting the first-argument-only traversal for functions.

The FIELD_NAME_EXTRACTOR visitor correctly extracts field names from expressions. However, the visitFunction method only examines the first argument of nested functions (line 509). While this is intentional and appropriate for mvmap's use case, consider adding a comment explaining this behavior to help future maintainers understand why other arguments are not checked.

Example addition:

 @Override
 public QualifiedName visitFunction(Function node, Void context) {
+  // For mvmap, the array field is expected in the first argument position
+  // e.g., mvindex(arr, 1, 3), mvappend(arr, value), etc.
   List<UnresolvedExpression> funcArgs = node.getFuncArgs();
   if (!funcArgs.isEmpty()) {
     return funcArgs.get(0).accept(this, context);
   }
   return null;
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 513c11c and 874db06.

📒 Files selected for processing (15)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java (4 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2 hunks)
  • docs/user/ppl/functions/collection.md (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLParser.g4 (2 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java (2 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (2 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
🧬 Code graph analysis (2)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
core/src/main/java/org/opensearch/sql/ast/expression/LambdaFunction.java (1)
  • Getter (20-47)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java (1)
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)
  • AbstractNodeVisitor (94-464)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (19)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (2)

76-80: Captured variable detection logic is correct.

The detection args.length > 3 correctly identifies when captured variables are present based on the args structure [array, lambda, captured1, ..., returnType].


99-120: LGTM! Limitation is now documented.

The branching logic correctly separates captured-variable mode from index-based mode. The documentation comment (lines 101-106) adequately addresses the prior review concern about the single captured variable limitation.

Consider whether capturedVar being null (when the captured field has a null value) should be handled specially or is acceptable runtime behavior.

docs/user/ppl/functions/collection.md (1)

728-794: Documentation is clear and comprehensive.

The MVMAP section follows the existing documentation patterns and covers:

  • Basic usage with implicit binding
  • Arithmetic transformations
  • External field references (captured variables)
  • Note about nested expressions

The examples are well-structured and demonstrate key use cases.

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

900-906: Test coverage for mvmap anonymization is appropriate.

The test follows the established pattern of other mv* function tests and validates that field names are masked as identifier and literals as ***.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1)

67-92: Lambda context handling is well-structured.

The refactored resolution flow correctly:

  1. Prioritizes lambda variable resolution
  2. Attempts field resolution with external variable capture when in lambda context
  3. Falls back to correlation fields and coalesce handling

The explicit control flow with early returns is clearer than Optional chaining for this multi-step resolution logic.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (2)

918-926: Mvmap unwrapping logic is appropriate.

The special handling correctly detects mvmap calls with an implicit lambda (second argument is LambdaFunction) and renders the original expression format mvmap(firstArg, lambdaBody) rather than the internal lambda representation.


935-943: Lambda function anonymization follows established patterns.

The new visitLambdaFunction method correctly:

  • Masks lambda parameter names using maskField()
  • Recursively analyzes the lambda body
  • Returns the standard lambda notation format
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1)

81-81: LGTM! MVMAP constant correctly added.

The new MVMAP enum constant follows the established pattern for collection functions and is properly positioned in the Collection functions section. It will be automatically included in ALL_NATIVE_FUNCTIONS through enum iteration.

ppl/src/main/antlr/OpenSearchPPLParser.g4 (2)

851-860: Grammar correctly defines mvmap with precedence over evalFunctionCall.

The mvmapFunctionCall alternative is placed before evalFunctionCall, ensuring mvmap syntax is matched specifically rather than falling through to the general eval function handler. This ordering is correct for specialized function syntax.

Based on learnings: AST generation should be verified for this new PPL parser syntax (will check in AstExpressionBuilder.java).


862-864: mvmapFunctionCall production correctly enforces 2-argument structure.

The grammar properly defines mvmap syntax as MVMAP(arg1, arg2) with exactly two functionArg parameters separated by a comma. This matches the intended syntax mvmap(array, expression) described in the PR objectives.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java (3)

25-26: LGTM! Necessary imports for the FIELD_NAME_EXTRACTOR visitor.

The AbstractNodeVisitor and Node imports support the new visitor pattern used for extracting field names from expressions for mvmap's implicit lambda binding.


453-473: LGTM! Implicit lambda binding correctly implemented.

The visitMvmapFunctionCall method properly transforms mvmap syntax into a lambda-based function call:

  • Extracts the field name from the first argument (even from nested expressions like mvindex(arr, 1, 3))
  • Creates a LambdaFunction using that field name as the parameter
  • Validates that the second argument is not already a lambda
  • Provides clear error messages for invalid syntax

This matches the PR objectives for implicit field binding (e.g., mvmap(arr, arr * 10) implicitly creates arr -> arr * 10).

Based on learnings: The dedicated visitMvmapFunctionCall method follows the pattern correctly established after previous review feedback.


481-488: LGTM! Helper method with clear documentation.

The extractFieldName helper correctly delegates to the visitor and includes a helpful JavaDoc example showing how field names are extracted from nested function calls.

core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (6)

11-11: LGTM! Necessary import for ArrayList.

The ArrayList import is required for initializing the capturedVariables list used in lambda variable capture.


65-73: LGTM! Well-documented fields for lambda variable capture.

The new fields capturedVariables and inLambdaContext are properly declared with clear JavaDoc explaining their purpose. The captured variables list stores external field references used in lambda expressions, and the context flag tracks lambda scope.


84-85: LGTM! Proper initialization of capturedVariables.

The capturedVariables list is correctly initialized as an empty ArrayList in the constructor, consistent with other field initializations.


87-102: LGTM! Well-designed constructor for lambda context isolation.

The new private constructor correctly creates a child context that:

  • Shares relBuilder and rexBuilder with the parent (enabling field resolution from current row)
  • Creates isolated rexLambdaRefMap and capturedVariables (preventing variable binding conflicts)
  • Sets inLambdaContext=true to mark lambda scope

This design enables proper variable scoping for nested lambda expressions.


133-140: LGTM! Simplified clone() delegates to private constructor.

The modified clone() method now uses the private constructor, providing a cleaner implementation with well-documented sharing semantics. The JavaDoc clearly explains that lambda contexts can reference fields from the parent context while maintaining their own variable mappings.


172-208: LGTM! captureVariable correctly implements deduplication and index calculation.

The captureVariable method properly handles external variable capture for lambda functions:

  1. Deduplication (lines 183-189): Efficiently reuses existing captures by checking if the RexNode was already captured
  2. Index calculation (lines 198-202): Correctly filters out captured variables when counting lambda parameters, fixing the bug noted in past review comments
  3. Storage (line 205): Uses "__captured_" prefix to distinguish captured variables from lambda parameters

The index calculation fix ensures that multiple captured variables get sequential indices after the lambda parameters:

  • 1 lambda param + 0 captures → captured var at index 1 ✓
  • 1 lambda param + 1 capture → next captured var at index 2 ✓

@dai-chen dai-chen merged commit 11727a4 into opensearch-project:main Dec 16, 2025
39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 16, 2025
* Support mvmap eval function

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	docs/user/ppl/functions/collection.rst
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
#	ppl/src/main/antlr/OpenSearchPPLLexer.g4
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

* update UT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update error handling and test cases

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update collection.md

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Update to support referencing single-value field

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update to use visitor pattern

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Addressing CodeRabbit

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Update

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
(cherry picked from commit 11727a4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
yuancu pushed a commit that referenced this pull request Dec 17, 2025
* Support mvmap eval function



# Conflicts:
#	docs/user/ppl/functions/collection.rst
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
#	ppl/src/main/antlr/OpenSearchPPLLexer.g4
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

* update UT



* update error handling and test cases



* fixes



* update collection.md



* Update to support referencing single-value field



* update to use visitor pattern



* Addressing CodeRabbit



* Update



---------


(cherry picked from commit 11727a4)

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@coderabbitai coderabbitai bot mentioned this pull request Jan 22, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants