[Enhancement] Implement pruning for neural sparse search #988

zhichao-aws · 2024-11-15T06:46:38Z

Description

Implement prune for sparse vectors, to save disk space and accelerate search speed with small loss on search relevance. #946

Implement pruning at sparse_encoding ingestion processor. Users can configure the pruning strategy when create the processor, and the processor will prune the sparse vectors before write to index.
Implement pruning at neural_sparse 2-phase search. Users can configure the pruning strategy when search with neural_sparse query. The query builder will prune the query before search on index.

Related Issues

#946

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

zhichao-aws · 2024-11-20T07:34:56Z

This PR is ready for review now

heemin32

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

zhichao-aws · 2024-11-21T03:25:53Z

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Based on our benchmark results in #946 , when searching, applying prune to 2-phase search has superseded applying it to neural sparse query body, on both precision and latency. Therefore, enhancing the existing 2-phase search pipeline makes more sense.
To maintain compatibility with existing APIs, the overall API will look like:

# ingestion pipeline
PUT /_ingest/pipeline/sparse-pipeline
{
    "description": "Calling sparse model to generate expanded tokens",
    "processors": [
        {
            "sparse_encoding": {
                "model_id": "fousVokBjnSupmOha8aN",
                "pruning_type": "alpha_mass",
                "pruning_ratio": 0.8,
                "field_map": {
                    "body": "body_sparse"
                },
            }
        }
    ]
}

# two phase pipeline
PUT /_search/pipeline/neural_search_pipeline
{
  "request_processors": [
    {
      "neural_sparse_two_phase_processor": {
        "tag": "neural-sparse",
        "description": "Creates a two-phase processor for neural sparse search.",
        "pruning_type": "alpha_mass",
        "pruning_ratio": 0.8,
      }
    }
  ]
}

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

The existing two-phase use max_ratio prune criteria. And now we add supports for other criteria as well

codecov · 2024-11-22T06:45:04Z

Codecov Report

Attention: Patch coverage is 96.85535% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.27%. Comparing base (3c7f275) to head (7486ee8).

Files with missing lines	Patch %	Lines
...opensearch/neuralsearch/util/prune/PruneUtils.java	96.80%	2 Missing and 1 partial ⚠️
...earch/processor/NeuralSparseTwoPhaseProcessor.java	94.11%	0 Missing and 1 partial ⚠️
...h/neuralsearch/query/NeuralSparseQueryBuilder.java	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #988      +/-   ##
============================================
+ Coverage     80.47%   81.27%   +0.79%     
- Complexity     1000     1054      +54     
============================================
  Files            78       80       +2     
  Lines          3411     3535     +124     
  Branches        578      611      +33     
============================================
+ Hits           2745     2873     +128     
+ Misses          425      423       -2     
+ Partials        241      239       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

heemin32

LGTM. Thanks!

martin-gaievski

Apart from minor comment, why this PR is trying to merge into main?
If this changes API that used to define the processor, it should be checked with application security and for that we need to merge to feature branch in main repo, and only after that's cleared from feature branch to main.

src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java

src/main/java/org/opensearch/neuralsearch/processor/SparseEncodingProcessor.java

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

martin-gaievski · 2024-11-25T02:44:23Z

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

+            );
+        } else {
+            // if we don't have prune type, then prune ratio field must not have value
+            if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {


we can merge this if with a previous else and have one single else if block

This else means PruneType is NONE right? It seems can be moved to https://github.com/opensearch-project/neural-search/pull/988/files#diff-8453ea75f8259ba96c246d483b2de9e21601fb9c3d033e8902756f5d101f2238R262 when validating the input ratio.

we can merge this if with a previous else and have one single else if block

ack

This else means PruneType is NONE right? It seems can be moved to https://github.com/opensearch-project/neural-search/pull/988/files#diff-8453ea75f8259ba96c246d483b2de9e21601fb9c3d033e8902756f5d101f2238R262 when validating the input ratio.

We want to validate that the PRUNE_RATIO field is not provided. Any values will be illegal

src/main/java/org/opensearch/neuralsearch/util/prune/PruneType.java

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

martin-gaievski · 2024-11-25T02:54:27Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+            }
+        }
+
+        switch (pruneType) {


can you think of modifying this into a map of <prune_type> -> <functional_interface>, so instead of switch structure we use map.get()?

Technically we can, but what's the advantage by doing this?

From readability perspective, switch-based method is more straightforward and have good readability.

From the performance perspective, the switch on enum will be optimized to operation on lookup table and can be executed on O(1) complexity. I tried to execute both methods for 100k times, and switch-based takes less time than map-based approach. (0.18ms vs 0.63ms)

test code:

/* * Copyright OpenSearch Contributors * SPDX-License-Identifier: Apache-2.0 */ package org.opensearch.neuralsearch.util.prune; import org.opensearch.test.OpenSearchTestCase; import java.util.HashMap; import java.util.Map; public class PrunePerfTests extends OpenSearchTestCase { private static final int ITERATIONS = 100_000; interface PruneHandler { void handle(PruneType type); } private static final Map<PruneType, PruneHandler> handlerMap = new HashMap<>(); static { handlerMap.put(PruneType.NONE, type -> handleNone()); handlerMap.put(PruneType.TOP_K, type -> handleTopK()); handlerMap.put(PruneType.ALPHA_MASS, type -> handleAlphaMass()); handlerMap.put(PruneType.MAX_RATIO, type -> handleMaxRatio()); handlerMap.put(PruneType.ABS_VALUE, type -> handleAbsValue()); } public void testPerf() { warmup(); long switchStart = System.nanoTime(); testSwitch(); long switchEnd = System.nanoTime(); long mapStart = System.nanoTime(); testMap(); long mapEnd = System.nanoTime(); System.out.printf("Switch method took: %.2f ms%n", (switchEnd - switchStart) / 1_000_000.0); System.out.printf("Map method took: %.2f ms%n", (mapEnd - mapStart) / 1_000_000.0); } private static void warmup() { for (int i = 0; i < 1000; i++) { testSwitch(); testMap(); } } private static void testSwitch() { PruneType[] types = PruneType.values(); for (int i = 0; i < ITERATIONS; i++) { PruneType type = types[i % types.length]; switch (type) { case NONE: handleNone(); break; case TOP_K: handleTopK(); break; case ALPHA_MASS: handleAlphaMass(); break; case MAX_RATIO: handleMaxRatio(); break; case ABS_VALUE: handleAbsValue(); break; } } } private static void testMap() { PruneType[] types = PruneType.values(); for (int i = 0; i < ITERATIONS; i++) { PruneType type = types[i % types.length]; handlerMap.get(type).handle(type); } } private static void handleNone() { } private static void handleTopK() { } private static void handleAlphaMass() { } private static void handleMaxRatio() { } private static void handleAbsValue() { } }

martin-gaievski · 2024-11-25T02:58:56Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+
+        switch (pruneType) {
+            case TOP_K:
+                return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);


Suggested change

return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);

return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

this is more reliable for float numbers, otherwise there is a chance of false positive

It doesn't seem correct to replace the floor to rint, from the definition, rint will give a even number if there are two values same close to the input value, I tested with input 3.5, floor result is 3 but rint result is 4.

Could you please give an example of false positive?

martin-gaievski · 2024-11-25T03:03:49Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+            }
+        }
+
+        switch (pruneType) {


same as above, can we use map instead of switch?

zhichao-aws · 2024-11-25T03:36:19Z

@martin-gaievski Thanks for the comments. We didn't create feature branch because there is no other contributors working on this and we regard the PR branch as feature branch.

I'm on PTO this week, will follow the app sec issue and solve the comments next week.

zane-neo · 2024-12-05T07:21:29Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+
+        switch (pruneType) {
+            case TOP_K:
+                return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);


It doesn't seem correct to replace the floor to rint, from the definition, rint will give a even number if there are two values same close to the input value, I tested with input 3.5, floor result is 3 but rint result is 4.

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

zane-neo · 2024-12-05T07:22:40Z

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

+     * @param pruneType The type of prune strategy
+     * @throws IllegalArgumentException if prune type is null
+     */
+    public static String getValidPruneRatioDescription(PruneType pruneType) {


[nit] this can be refactored to a static map.

Please refer to the discussion with Martin at above

zane-neo · 2024-12-05T07:41:14Z

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java

+            );
+        } else {
+            // if we don't have prune type, then prune ratio field must not have value
+            if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {


This else means PruneType is NONE right? It seems can be moved to https://github.com/opensearch-project/neural-search/pull/988/files#diff-8453ea75f8259ba96c246d483b2de9e21601fb9c3d033e8902756f5d101f2238R262 when validating the input ratio.

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

martin-gaievski mentioned this pull request Nov 16, 2024

[FEATURE] Enhanced adaptive token pruning for neural sparse search #989

Open

zhichao-aws force-pushed the pruning_dev branch from 1e55b7c to 46b9d9a Compare November 20, 2024 07:34

zhichao-aws marked this pull request as ready for review November 20, 2024 07:34

zhichao-aws requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, sean-zheng-amazon, model-collapse, zane-neo, vibrantvarun, yuye-aws and minalsha as code owners November 20, 2024 07:34

heemin32 reviewed Nov 20, 2024

View reviewed changes

src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/neuralsearch/util/prune/PruneUtils.java Outdated Show resolved Hide resolved

zhichao-aws changed the title ~~[Feature] Implement pruning for neural sparse search~~ [Enhancement] Implement pruning for neural sparse search Nov 22, 2024

zhichao-aws requested a review from heemin32 November 22, 2024 07:18

heemin32 approved these changes Nov 22, 2024

View reviewed changes

martin-gaievski requested changes Nov 25, 2024

View reviewed changes

zane-neo reviewed Dec 5, 2024

View reviewed changes

zhichao-aws added 3 commits December 10, 2024 15:05

add impl

1e09989

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

add UT

adca9be

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

rename pruneType; UT

2cc0d10

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws added 12 commits December 10, 2024 15:05

changelog

26098cc

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

ut

6af02b8

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

add it

2ac90de

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

change on 2-phase

97963f1

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

UT

c5dd602

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

it

c7f0031

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

rename

0fd2597

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

enhance: more detailed error message

09e4765

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

refactor to prune and split

5b8ab70

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

changelog

6cabbc0

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

fix UT cov

cffd829

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

address review comments

0d928a9

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

zhichao-aws force-pushed the pruning_dev branch from 2a3e2cf to 0d928a9 Compare December 10, 2024 08:43

enlarge score diff range

7486ee8

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Implement pruning for neural sparse search #988

[Enhancement] Implement pruning for neural sparse search #988

zhichao-aws commented Nov 15, 2024

zhichao-aws commented Nov 20, 2024

heemin32 left a comment

zhichao-aws commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

heemin32 left a comment

martin-gaievski left a comment

martin-gaievski Nov 25, 2024

zane-neo Dec 5, 2024

zhichao-aws Dec 10, 2024

zhichao-aws Dec 10, 2024

martin-gaievski Nov 25, 2024

zhichao-aws Dec 10, 2024

zhichao-aws Dec 10, 2024 •

edited

Loading

martin-gaievski Nov 25, 2024

zane-neo Dec 5, 2024

zhichao-aws Dec 10, 2024

martin-gaievski Nov 25, 2024

zhichao-aws commented Nov 25, 2024

zane-neo Dec 5, 2024

zane-neo Dec 5, 2024

zhichao-aws Dec 10, 2024

zane-neo Dec 5, 2024

	return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
	return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

[Enhancement] Implement pruning for neural sparse search #988

Are you sure you want to change the base?

[Enhancement] Implement pruning for neural sparse search #988

Conversation

zhichao-aws commented Nov 15, 2024

Description

Related Issues

Check List

zhichao-aws commented Nov 20, 2024

heemin32 left a comment

Choose a reason for hiding this comment

zhichao-aws commented Nov 21, 2024 • edited Loading

codecov bot commented Nov 22, 2024 • edited Loading

Codecov Report

heemin32 left a comment

Choose a reason for hiding this comment

martin-gaievski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhichao-aws Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhichao-aws commented Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhichao-aws commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

zhichao-aws Dec 10, 2024 •

edited

Loading