Support merging object-type fields when fetching the schema from the index by xinyual · Pull Request #3653 · opensearch-project/sql

xinyual · 2025-05-22T07:11:54Z

Description

This PR supports merging object-type fields when fetching the schema from the several. For example, we have

PUT demo1
{
  "mappings": {
    "properties": {
      "machine": {
        "properties": {
          "os1": {
            "type": "text"
          },
          "ram1": {
            "type": "long"
          }
        }
      }
    }
  }
}

And

PUT demo2
{
  "mappings": {
    "properties": {
      "machine": {
        "properties": {
          "os2": {
            "type": "text"
          },
          "ram2": {
            "type": "long"
          }
        }
      }
    }
  }
}

Now we support source=demo1, demo2 | fields machine.os1, machine.os2

Also, did some benchmark test with different indices number and depth, reporting the average time of merging operation

	indices=120	indices=1000
depth=15	0.103ms	0.951ms
depth=5	0.041ms	0.329ms

You can also do benchmark using MergeArrayAndObjectMapBenchmark with different arguments.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3625

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xinyual <xinyual@amazon.com>

penghuo · 2025-05-22T15:01:40Z

integ-test/src/test/java/org/opensearch/sql/calcite/standalone/CalcitePPLBasicIT.java

+    loadIndex(Index.MERGE_TEST_1);
+    loadIndex(Index.MERGE_TEST_2);


consider yamlRestIT, it does not required prepare mapping and data file, https://github.com/opensearch-project/sql/tree/main/integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues.

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

penghuo · 2025-05-22T15:17:08Z

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

+  private Boolean checkWhetherToMerge(OpenSearchDataType first, OpenSearchDataType second) {
+    if (first.getExprCoreType() == second.getExprCoreType()
+        && (first.getExprCoreType() == ExprCoreType.STRUCT
+            || first.getExprCoreType() == ExprCoreType.ARRAY)) {
+      return true;
+    }
+    return false;
+  }


add an MergeRule abstraction.

For basic data type, before PR, the rule is Latest, after PR, the rule is noChange

For advance type, the rule is DeepMerge.

I add a mergeRule utils but not sure whether it meets your requirement. For basic data type, I still keep the latest datatype according the order of indices.

your code looks good.

We may support other type rules, for instance WideningMergeRule, so my previous thoughts is each Rule define it is match and mergeInto method

public static void merge(Map<String, OpenSearchDataType> target, Map<String, OpenSearchDataType> source) { for (Map.Entry<String, OpenSearchDataType> entry : source.entrySet()) { String key = entry.getKey(); OpenSearchDataType sourceValue = entry.getValue(); OpenSearchDataType targetValue = target.get(key); RuleSelectorChain.selectRule(sourceValue, targetValue).mergeInto(key, sourceValue, target); } } public class RuleSelectorChain { private static final List<MergeRuleSelector> RULE_SELECTORS = List.of( new DeepMergeSelector(), new LatestSelector() ); public static MergeRule selectRule(OpenSearchDataType source, OpenSearchDataType target) { if (target == null) { return new LatestWinsRule(); } return RULE_SELECTORS.stream() .map(selector -> selector.select(source, target)) .filter(Optional::isPresent) .map(Optional::get) .findFirst() .orElse(new LatestSelector()); // this is default } } public class DeepMergeSelector implements MergeRuleSelector { @Override public Optional<MergeRule> select(OpenSearchDataType source, OpenSearchDataType target) { // return Optional.of(new DeepMergeRule()) if condition meet } } public class DeepMergeRule implements MergeRule { @Override public void mergeInto(String key, OpenSearchDataType source, Map<String, OpenSearchDataType> target) { OpenSearchDataType existing = target.get(key); merge(existing.getProperties(), source.getProperties()); target.put(key, existing); } }

Cool. Will refactor code like your suggestion.

I already refactor code and add interface with two implementations. Please check it.

integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java

penghuo · 2025-05-22T15:22:38Z

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

+
+      if (target.containsKey(key) && checkWhetherToMerge(value, target.get(key))) {
+        OpenSearchDataType merged = target.get(key);
+        mergeObjectAndArrayInsideMap(merged.getProperties(), value.getProperties());


Add a performance test for deep nesting, e.g., 10+ levels and 100/1000 indices. you can leverage benchmark in repo.

based on test result,
a. consider depth limit settings
b. document merging limitations

Already add a benchmark for it. I tried 15 depth with 120 indices. The result is
Benchmark Mode Cnt Score Error Units
testMerge thrpt 25 9619.794 ± 393.331 ops/s
What is the expectation minimum ops of this merging action?

Could u publish test result in PR description?
How long it will take to merge 15 depth with 120 indices? what if 1000 indices?

Sure. Already add result to the description. Let me know if you want more data.

Run a test load test, with 1000 indices, results shows when concurrent requred increased to 64, the latency increase to 12s.

next step.

Could u double confirm load test results and update PR descritions?

Profile OpenSearch, if the major latency contributor is getIndexMapping API, open issue in core repo.

Update PPL Inconsistent Field Types across indices section with test results.

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

Signed-off-by: xinyual <xinyual@amazon.com>

...h/java/org/opensearch/sql/expression/operator/predicate/PatternsWindowFunctionBenchmark.java

penghuo · 2025-05-27T18:18:48Z

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

+
+      if (target.containsKey(key) && checkWhetherToMerge(value, target.get(key))) {
+        OpenSearchDataType merged = target.get(key);
+        mergeObjectAndArrayInsideMap(merged.getProperties(), value.getProperties());


Could u publish test result in PR description?
How long it will take to merge 15 depth with 120 indices? what if 1000 indices?

integ-test/build.gradle

penghuo · 2025-05-27T22:11:49Z

...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java

+  private Boolean checkWhetherToMerge(OpenSearchDataType first, OpenSearchDataType second) {
+    if (first.getExprCoreType() == second.getExprCoreType()
+        && (first.getExprCoreType() == ExprCoreType.STRUCT
+            || first.getExprCoreType() == ExprCoreType.ARRAY)) {
+      return true;
+    }
+    return false;
+  }


your code looks good.

We may support other type rules, for instance WideningMergeRule, so my previous thoughts is each Rule define it is match and mergeInto method

public static void merge(Map<String, OpenSearchDataType> target, Map<String, OpenSearchDataType> source) { for (Map.Entry<String, OpenSearchDataType> entry : source.entrySet()) { String key = entry.getKey(); OpenSearchDataType sourceValue = entry.getValue(); OpenSearchDataType targetValue = target.get(key); RuleSelectorChain.selectRule(sourceValue, targetValue).mergeInto(key, sourceValue, target); } } public class RuleSelectorChain { private static final List<MergeRuleSelector> RULE_SELECTORS = List.of( new DeepMergeSelector(), new LatestSelector() ); public static MergeRule selectRule(OpenSearchDataType source, OpenSearchDataType target) { if (target == null) { return new LatestWinsRule(); } return RULE_SELECTORS.stream() .map(selector -> selector.select(source, target)) .filter(Optional::isPresent) .map(Optional::get) .findFirst() .orElse(new LatestSelector()); // this is default } } public class DeepMergeSelector implements MergeRuleSelector { @Override public Optional<MergeRule> select(OpenSearchDataType source, OpenSearchDataType target) { // return Optional.of(new DeepMergeRule()) if condition meet } } public class DeepMergeRule implements MergeRule { @Override public void mergeInto(String key, OpenSearchDataType source, Map<String, OpenSearchDataType> target) { OpenSearchDataType existing = target.get(key); merge(existing.getProperties(), source.getProperties()); target.put(key, existing); } }

Signed-off-by: xinyual <xinyual@amazon.com>

penghuo · 2025-05-28T20:39:58Z

...h/java/org/opensearch/sql/expression/operator/predicate/MergeArrayAndObjectMapBenchmark.java

+  public void testMerge() {
+    Map<String, OpenSearchDataType> finalResult = new HashMap<>();
+    for (Map<String, OpenSearchDataType> map : candidateMaps) {
+      OpenSearchDescribeIndexRequest.mergeObjectAndArrayInsideMap(finalResult, map);


mergeObjectAndArrayInsideMap not exist

penghuo

@xinyual Please create a issue to track pressure test and support merge limitation.

LantaoJin · 2025-06-07T02:58:14Z

@xinyual could you resolve conflicts?

Signed-off-by: xinyual <xinyual@amazon.com>

…index (opensearch-project#3653) Signed-off-by: xinyual <xinyual@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> (cherry picked from commit ed507d7)

…index (#3653) * merge object/array Signed-off-by: xinyual <xinyual@amazon.com> * simplified code Signed-off-by: xinyual <xinyual@amazon.com> * apply spotless Signed-off-by: xinyual <xinyual@amazon.com> * fix IT by adding fields Signed-off-by: xinyual <xinyual@amazon.com> * revert to hashmap Signed-off-by: xinyual <xinyual@amazon.com> * filter one indices case Signed-off-by: xinyual <xinyual@amazon.com> * add ut and merge rules Signed-off-by: xinyual <xinyual@amazon.com> * add benchmark test Signed-off-by: xinyual <xinyual@amazon.com> * revert change Signed-off-by: xinyual <xinyual@amazon.com> * refactor merge rules Signed-off-by: xinyual <xinyual@amazon.com> * fix IT Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com>

anasalkouz · 2025-06-19T20:18:22Z

Is this backported to 3.0 or 3.1?

xinyual added 3 commits May 22, 2025 13:42

merge object/array

1d06008

Signed-off-by: xinyual <xinyual@amazon.com>

simplified code

63f8cb9

Signed-off-by: xinyual <xinyual@amazon.com>

apply spotless

ffd11a5

Signed-off-by: xinyual <xinyual@amazon.com>

xinyual marked this pull request as ready for review May 22, 2025 07:16

xinyual requested review from GumpacG, LantaoJin, MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, dai-chen, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, noCharger, penghuo, ps48, qianheng-aws, seankao-az and ykmr1224 as code owners May 22, 2025 07:16

penghuo requested changes May 22, 2025

View reviewed changes

xinyual added 5 commits May 23, 2025 12:46

fix IT by adding fields

a81a510

Signed-off-by: xinyual <xinyual@amazon.com>

revert to hashmap

a617b00

Signed-off-by: xinyual <xinyual@amazon.com>

filter one indices case

81754d2

Signed-off-by: xinyual <xinyual@amazon.com>

add ut and merge rules

70b01c4

Signed-off-by: xinyual <xinyual@amazon.com>

add benchmark test

95e576d

Signed-off-by: xinyual <xinyual@amazon.com>

penghuo requested changes May 27, 2025

View reviewed changes

xinyual added 2 commits May 28, 2025 11:34

revert change

9e65b49

Signed-off-by: xinyual <xinyual@amazon.com>

refactor merge rules

87b28b2

Signed-off-by: xinyual <xinyual@amazon.com>

penghuo reviewed May 28, 2025

View reviewed changes

penghuo previously approved these changes Jun 4, 2025

View reviewed changes

LantaoJin previously approved these changes Jun 6, 2025

View reviewed changes

merge to main

2cc1b8a

Signed-off-by: xinyual <xinyual@amazon.com>

xinyual dismissed stale reviews from LantaoJin and penghuo via 2cc1b8a June 9, 2025 02:50

fix IT

7a35a11

Signed-off-by: xinyual <xinyual@amazon.com>

LantaoJin approved these changes Jun 9, 2025

View reviewed changes

penghuo approved these changes Jun 9, 2025

View reviewed changes

penghuo merged commit ed507d7 into opensearch-project:main Jun 9, 2025
22 checks passed

xinyual mentioned this pull request Jun 10, 2025

[FEATURE] Memory limitation when merging thousands of complex indices #3750

Closed

ahkcs mentioned this pull request Jun 10, 2025

[Backport 2.19-dev] Support merging object-type fields when fetching the schema from the index #3758

Closed

LantaoJin mentioned this pull request Jun 16, 2025

[FEATURE] Support merging object-type fields when fetching the schema from the index #3625

Closed

xinyual mentioned this pull request Jun 17, 2025

Merge index schema meta opensearch-project/skills#596

Merged

5 tasks

Swiddis added the enhancement New feature or request label Jun 26, 2025

tkykenmt mentioned this pull request Jan 10, 2026

[feature] SQL/PPL General Enhancements tkykenmt/opensearch-feature-explorer#847

Closed

3 tasks

		loadIndex(Index.MERGE_TEST_1);
		loadIndex(Index.MERGE_TEST_2);

Conversation

xinyual commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penghuo left a comment

Choose a reason for hiding this comment

Uh oh!

LantaoJin commented Jun 7, 2025

Uh oh!

Uh oh!

anasalkouz commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xinyual commented May 22, 2025 •

edited

Loading