Support merging object-type fields when fetching the schema from the index#3653
Support merging object-type fields when fetching the schema from the index#3653penghuo merged 12 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
| loadIndex(Index.MERGE_TEST_1); | ||
| loadIndex(Index.MERGE_TEST_2); |
There was a problem hiding this comment.
consider yamlRestIT, it does not required prepare mapping and data file, https://github.com/opensearch-project/sql/tree/main/integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues.
...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java
Outdated
Show resolved
Hide resolved
| private Boolean checkWhetherToMerge(OpenSearchDataType first, OpenSearchDataType second) { | ||
| if (first.getExprCoreType() == second.getExprCoreType() | ||
| && (first.getExprCoreType() == ExprCoreType.STRUCT | ||
| || first.getExprCoreType() == ExprCoreType.ARRAY)) { | ||
| return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
add an MergeRule abstraction.
- For basic data type, before PR, the rule is Latest, after PR, the rule is noChange
- For advance type, the rule is DeepMerge.
There was a problem hiding this comment.
I add a mergeRule utils but not sure whether it meets your requirement. For basic data type, I still keep the latest datatype according the order of indices.
There was a problem hiding this comment.
your code looks good.
We may support other type rules, for instance WideningMergeRule, so my previous thoughts is each Rule define it is match and mergeInto method
public static void merge(Map<String, OpenSearchDataType> target, Map<String, OpenSearchDataType> source) {
for (Map.Entry<String, OpenSearchDataType> entry : source.entrySet()) {
String key = entry.getKey();
OpenSearchDataType sourceValue = entry.getValue();
OpenSearchDataType targetValue = target.get(key);
RuleSelectorChain.selectRule(sourceValue, targetValue).mergeInto(key, sourceValue, target);
}
}
public class RuleSelectorChain {
private static final List<MergeRuleSelector> RULE_SELECTORS = List.of(
new DeepMergeSelector(),
new LatestSelector()
);
public static MergeRule selectRule(OpenSearchDataType source, OpenSearchDataType target) {
if (target == null) {
return new LatestWinsRule();
}
return RULE_SELECTORS.stream()
.map(selector -> selector.select(source, target))
.filter(Optional::isPresent)
.map(Optional::get)
.findFirst()
.orElse(new LatestSelector()); // this is default
}
}
public class DeepMergeSelector implements MergeRuleSelector {
@Override
public Optional<MergeRule> select(OpenSearchDataType source, OpenSearchDataType target) {
// return Optional.of(new DeepMergeRule()) if condition meet
}
}
public class DeepMergeRule implements MergeRule {
@Override
public void mergeInto(String key, OpenSearchDataType source, Map<String, OpenSearchDataType> target) {
OpenSearchDataType existing = target.get(key);
merge(existing.getProperties(), source.getProperties());
target.put(key, existing);
}
}
There was a problem hiding this comment.
Cool. Will refactor code like your suggestion.
There was a problem hiding this comment.
I already refactor code and add interface with two implementations. Please check it.
integ-test/src/test/java/org/opensearch/sql/ppl/FieldsCommandIT.java
Outdated
Show resolved
Hide resolved
|
|
||
| if (target.containsKey(key) && checkWhetherToMerge(value, target.get(key))) { | ||
| OpenSearchDataType merged = target.get(key); | ||
| mergeObjectAndArrayInsideMap(merged.getProperties(), value.getProperties()); |
There was a problem hiding this comment.
- Add a performance test for deep nesting, e.g., 10+ levels and 100/1000 indices. you can leverage benchmark in repo.
- based on test result,
a. consider depth limit settings
b. document merging limitations
There was a problem hiding this comment.
Already add a benchmark for it. I tried 15 depth with 120 indices. The result is
Benchmark Mode Cnt Score Error Units
testMerge thrpt 25 9619.794 ± 393.331 ops/s
What is the expectation minimum ops of this merging action?
There was a problem hiding this comment.
Could u publish test result in PR description?
How long it will take to merge 15 depth with 120 indices? what if 1000 indices?
There was a problem hiding this comment.
Sure. Already add result to the description. Let me know if you want more data.
There was a problem hiding this comment.
Run a test load test, with 1000 indices, results shows when concurrent requred increased to 64, the latency increase to 12s.
next step.
- Could u double confirm load test results and update PR descritions?
- Profile OpenSearch, if the major latency contributor is getIndexMapping API, open issue in core repo.
- Update PPL Inconsistent Field Types across indices section with test results.
...c/main/java/org/opensearch/sql/opensearch/request/system/OpenSearchDescribeIndexRequest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
...h/java/org/opensearch/sql/expression/operator/predicate/PatternsWindowFunctionBenchmark.java
Show resolved
Hide resolved
|
|
||
| if (target.containsKey(key) && checkWhetherToMerge(value, target.get(key))) { | ||
| OpenSearchDataType merged = target.get(key); | ||
| mergeObjectAndArrayInsideMap(merged.getProperties(), value.getProperties()); |
There was a problem hiding this comment.
Could u publish test result in PR description?
How long it will take to merge 15 depth with 120 indices? what if 1000 indices?
| private Boolean checkWhetherToMerge(OpenSearchDataType first, OpenSearchDataType second) { | ||
| if (first.getExprCoreType() == second.getExprCoreType() | ||
| && (first.getExprCoreType() == ExprCoreType.STRUCT | ||
| || first.getExprCoreType() == ExprCoreType.ARRAY)) { | ||
| return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
your code looks good.
We may support other type rules, for instance WideningMergeRule, so my previous thoughts is each Rule define it is match and mergeInto method
public static void merge(Map<String, OpenSearchDataType> target, Map<String, OpenSearchDataType> source) {
for (Map.Entry<String, OpenSearchDataType> entry : source.entrySet()) {
String key = entry.getKey();
OpenSearchDataType sourceValue = entry.getValue();
OpenSearchDataType targetValue = target.get(key);
RuleSelectorChain.selectRule(sourceValue, targetValue).mergeInto(key, sourceValue, target);
}
}
public class RuleSelectorChain {
private static final List<MergeRuleSelector> RULE_SELECTORS = List.of(
new DeepMergeSelector(),
new LatestSelector()
);
public static MergeRule selectRule(OpenSearchDataType source, OpenSearchDataType target) {
if (target == null) {
return new LatestWinsRule();
}
return RULE_SELECTORS.stream()
.map(selector -> selector.select(source, target))
.filter(Optional::isPresent)
.map(Optional::get)
.findFirst()
.orElse(new LatestSelector()); // this is default
}
}
public class DeepMergeSelector implements MergeRuleSelector {
@Override
public Optional<MergeRule> select(OpenSearchDataType source, OpenSearchDataType target) {
// return Optional.of(new DeepMergeRule()) if condition meet
}
}
public class DeepMergeRule implements MergeRule {
@Override
public void mergeInto(String key, OpenSearchDataType source, Map<String, OpenSearchDataType> target) {
OpenSearchDataType existing = target.get(key);
merge(existing.getProperties(), source.getProperties());
target.put(key, existing);
}
}
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
| public void testMerge() { | ||
| Map<String, OpenSearchDataType> finalResult = new HashMap<>(); | ||
| for (Map<String, OpenSearchDataType> map : candidateMaps) { | ||
| OpenSearchDescribeIndexRequest.mergeObjectAndArrayInsideMap(finalResult, map); |
There was a problem hiding this comment.
mergeObjectAndArrayInsideMap not exist
|
@xinyual could you resolve conflicts? |
Signed-off-by: xinyual <xinyual@amazon.com>
…index (opensearch-project#3653) Signed-off-by: xinyual <xinyual@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> (cherry picked from commit ed507d7)
…index (#3653) * merge object/array Signed-off-by: xinyual <xinyual@amazon.com> * simplified code Signed-off-by: xinyual <xinyual@amazon.com> * apply spotless Signed-off-by: xinyual <xinyual@amazon.com> * fix IT by adding fields Signed-off-by: xinyual <xinyual@amazon.com> * revert to hashmap Signed-off-by: xinyual <xinyual@amazon.com> * filter one indices case Signed-off-by: xinyual <xinyual@amazon.com> * add ut and merge rules Signed-off-by: xinyual <xinyual@amazon.com> * add benchmark test Signed-off-by: xinyual <xinyual@amazon.com> * revert change Signed-off-by: xinyual <xinyual@amazon.com> * refactor merge rules Signed-off-by: xinyual <xinyual@amazon.com> * fix IT Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com>
|
Is this backported to 3.0 or 3.1? |
Description
This PR supports merging object-type fields when fetching the schema from the several. For example, we have
And
Now we support
source=demo1, demo2 | fields machine.os1, machine.os2Also, did some benchmark test with different indices number and depth, reporting the average time of merging operation
You can also do benchmark using
MergeArrayAndObjectMapBenchmarkwith different arguments.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
#3625
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.