Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding search processor for score normalization and combination #227

Merged

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Jul 19, 2023

Description

Added processor that does score normalization and combination for Hybrid query results. Processor is registered via search pipeline API as search phase result processor.

In this PR we're adding support for techniques "min-max" for normalization and "arithmetic mean" for combination. Support for weights and other techniques will be added in next PRs to keep size of a single PR minimal.

Below is example of definition for registering new processor as part of search pipeline:

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "min_max"
                },
                "combination": {
                    "technique": "arithmetic_mean"
                }
            }
        }
    ]
}

Issues Resolved

#228, part of solution for #126

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch 3 times, most recently from 35463f8 to 9b0a28a Compare July 19, 2023 17:16
@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #227 (e746e12) into feature/normalization (c4d0a0c) will increase coverage by 0.59%.
The diff coverage is 87.69%.

@@                     Coverage Diff                     @@
##             feature/normalization     #227      +/-   ##
===========================================================
+ Coverage                    84.95%   85.55%   +0.59%     
- Complexity                     216      296      +80     
===========================================================
  Files                           15       24       +9     
  Lines                          678      872     +194     
  Branches                       109      134      +25     
===========================================================
+ Hits                           576      746     +170     
- Misses                          60       67       +7     
- Partials                        42       59      +17     
Files Changed Coverage Δ
...ation/ArithmeticMeanScoreCombinationTechnique.java 80.00% <80.00%> (ø)
...earch/processor/normalization/ScoreNormalizer.java 80.00% <80.00%> (ø)
...neuralsearch/processor/NormalizationProcessor.java 83.33% <83.33%> (ø)
...rmalization/MinMaxScoreNormalizationTechnique.java 84.61% <84.61%> (ø)
...arch/processor/NormalizationProcessorWorkflow.java 85.00% <85.00%> (ø)
...g/opensearch/neuralsearch/plugin/NeuralSearch.java 81.25% <87.50%> (+3.47%) ⬆️
...ralsearch/processor/combination/ScoreCombiner.java 91.83% <91.83%> (ø)
...processor/combination/ScoreCombinationFactory.java 100.00% <100.00%> (ø)
...ocessor/factory/NormalizationProcessorFactory.java 100.00% <100.00%> (ø)
...essor/normalization/ScoreNormalizationFactory.java 100.00% <100.00%> (ø)

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch from 6580a8c to 3ebe720 Compare July 19, 2023 21:54
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch from 319d4bb to 1a196ce Compare July 21, 2023 21:42
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch from 08fc6d5 to 150ff33 Compare July 22, 2023 00:42
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski requested a review from heemin32 July 24, 2023 23:35
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch from 61e5b44 to 74b23b2 Compare July 24, 2023 23:49
@martin-gaievski martin-gaievski requested a review from heemin32 July 25, 2023 17:37
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski force-pushed the feature/normalization-injector-processor branch from f2ba35a to 29b397f Compare July 25, 2023 19:05
@navneet1v
Copy link
Collaborator

@martin-gaievski all the comments which you have resolved please mark them as resolved.

Comment on lines +104 to +107
int shardId = compoundQueryTopDocs.scoreDocs[0].shardIndex;
for (int j = 0; j < maxHits && j < sortedScores.size(); j++) {
int docId = sortedScores.get(j);
finalScoreDocs[j] = new ScoreDoc(docId, combinedNormalizedScoresByDocId.get(docId), shardId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not updating the shard id? why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this step we're working with a single shard results. It's the same shard id, so we need just to copy it to new results

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful if you can add the comments in the code, around which operation is at shard level and what is not.

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski
Copy link
Member Author

@martin-gaievski all the comments which you have resolved please mark them as resolved.

sure, I addressed all comments, @navneet1v please check PR again

@martin-gaievski martin-gaievski merged commit 2224f1f into feature/normalization Jul 26, 2023
martin-gaievski added a commit that referenced this pull request Aug 3, 2023
* Adding search processor for score normalization and combination

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants