Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Implement pruning for neural sparse search #988

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

zhichao-aws
Copy link
Member

Description

Implement prune for sparse vectors, to save disk space and accelerate search speed with small loss on search relevance. #946

  • Implement pruning at sparse_encoding ingestion processor. Users can configure the pruning strategy when create the processor, and the processor will prune the sparse vectors before write to index.
  • Implement pruning at neural_sparse 2-phase search. Users can configure the pruning strategy when search with neural_sparse query. The query builder will prune the query before search on index.

Related Issues

#946

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@zhichao-aws
Copy link
Member Author

This PR is ready for review now

Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

@zhichao-aws
Copy link
Member Author

zhichao-aws commented Nov 21, 2024

Could you provide an overview of how the overall API will look? I initially thought this change would only affect the query side, but it seems it will also modify the parameters for neural_sparse_two_phase_processor.

Based on our benchmark results in #946 , when searching, applying prune to 2-phase search has superseded applying it to neural sparse query body, on both precision and latency. Therefore, enhancing the existing 2-phase search pipeline makes more sense.
To maintain compatibility with existing APIs, the overall API will look like:

# ingestion pipeline
PUT /_ingest/pipeline/sparse-pipeline
{
    "description": "Calling sparse model to generate expanded tokens",
    "processors": [
        {
            "sparse_encoding": {
                "model_id": "fousVokBjnSupmOha8aN",
                "pruning_type": "alpha_mass",
                "pruning_ratio": 0.8,
                "field_map": {
                    "body": "body_sparse"
                },
            }
        }
    ]
}

# two phase pipeline
PUT /_search/pipeline/neural_search_pipeline
{
  "request_processors": [
    {
      "neural_sparse_two_phase_processor": {
        "tag": "neural-sparse",
        "description": "Creates a two-phase processor for neural sparse search.",
        "pruning_type": "alpha_mass",
        "pruning_ratio": 0.8,
      }
    }
  ]
}

Additionally, the current implementation appears to be focused on two-phase processing with different strategies for splitting vectors, rather than a combination of pruning and two-phase processing?

The existing two-phase use max_ratio prune criteria. And now we add supports for other criteria as well

@zhichao-aws zhichao-aws changed the title [Feature] Implement pruning for neural sparse search [Enhancement] Implement pruning for neural sparse search Nov 22, 2024
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 96.85535% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.27%. Comparing base (3c7f275) to head (7486ee8).

Files with missing lines Patch % Lines
...opensearch/neuralsearch/util/prune/PruneUtils.java 96.80% 2 Missing and 1 partial ⚠️
...earch/processor/NeuralSparseTwoPhaseProcessor.java 94.11% 0 Missing and 1 partial ⚠️
...h/neuralsearch/query/NeuralSparseQueryBuilder.java 83.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #988      +/-   ##
============================================
+ Coverage     80.47%   81.27%   +0.79%     
- Complexity     1000     1054      +54     
============================================
  Files            78       80       +2     
  Lines          3411     3535     +124     
  Branches        578      611      +33     
============================================
+ Hits           2745     2873     +128     
+ Misses          425      423       -2     
+ Partials        241      239       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhichao-aws zhichao-aws requested a review from heemin32 November 22, 2024 07:18
Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from minor comment, why this PR is trying to merge into main?
If this changes API that used to define the processor, it should be checked with application security and for that we need to merge to feature branch in main repo, and only after that's cleared from feature branch to main.

);
} else {
// if we don't have prune type, then prune ratio field must not have value
if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this if with a previous else and have one single else if block

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this if with a previous else and have one single else if block

ack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else means PruneType is NONE right? It seems can be moved to https://github.com/opensearch-project/neural-search/pull/988/files#diff-8453ea75f8259ba96c246d483b2de9e21601fb9c3d033e8902756f5d101f2238R262 when validating the input ratio.

We want to validate that the PRUNE_RATIO field is not provided. Any values will be illegal

}
}

switch (pruneType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you think of modifying this into a map of <prune_type> -> <functional_interface>, so instead of switch structure we use map.get()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we can, but what's the advantage by doing this?

From readability perspective, switch-based method is more straightforward and have good readability.

From the performance perspective, the switch on enum will be optimized to operation on lookup table and can be executed on O(1) complexity. I tried to execute both methods for 100k times, and switch-based takes less time than map-based approach. (0.18ms vs 0.63ms)

Copy link
Member Author

@zhichao-aws zhichao-aws Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test code:

/*
 * Copyright OpenSearch Contributors
 * SPDX-License-Identifier: Apache-2.0
 */
package org.opensearch.neuralsearch.util.prune;

import org.opensearch.test.OpenSearchTestCase;

import java.util.HashMap;
import java.util.Map;

public class PrunePerfTests extends OpenSearchTestCase {
    private static final int ITERATIONS = 100_000;

    interface PruneHandler {
        void handle(PruneType type);
    }

    private static final Map<PruneType, PruneHandler> handlerMap = new HashMap<>();

    static {
        handlerMap.put(PruneType.NONE, type -> handleNone());
        handlerMap.put(PruneType.TOP_K, type -> handleTopK());
        handlerMap.put(PruneType.ALPHA_MASS, type -> handleAlphaMass());
        handlerMap.put(PruneType.MAX_RATIO, type -> handleMaxRatio());
        handlerMap.put(PruneType.ABS_VALUE, type -> handleAbsValue());
    }

    public void testPerf() {
        warmup();
        long switchStart = System.nanoTime();
        testSwitch();
        long switchEnd = System.nanoTime();

        long mapStart = System.nanoTime();
        testMap();
        long mapEnd = System.nanoTime();

        System.out.printf("Switch method took: %.2f ms%n", (switchEnd - switchStart) / 1_000_000.0);
        System.out.printf("Map method took: %.2f ms%n", (mapEnd - mapStart) / 1_000_000.0);
    }

    private static void warmup() {
        for (int i = 0; i < 1000; i++) {
            testSwitch();
            testMap();
        }
    }

    private static void testSwitch() {
        PruneType[] types = PruneType.values();
        for (int i = 0; i < ITERATIONS; i++) {
            PruneType type = types[i % types.length];
            switch (type) {
                case NONE:
                    handleNone();
                    break;
                case TOP_K:
                    handleTopK();
                    break;
                case ALPHA_MASS:
                    handleAlphaMass();
                    break;
                case MAX_RATIO:
                    handleMaxRatio();
                    break;
                case ABS_VALUE:
                    handleAbsValue();
                    break;
            }
        }
    }

    private static void testMap() {
        PruneType[] types = PruneType.values();
        for (int i = 0; i < ITERATIONS; i++) {
            PruneType type = types[i % types.length];
            handlerMap.get(type).handle(type);
        }
    }

    private static void handleNone() {

    }

    private static void handleTopK() {

    }

    private static void handleAlphaMass() {

    }

    private static void handleMaxRatio() {

    }

    private static void handleAbsValue() {

    }
}


switch (pruneType) {
case TOP_K:
return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
return pruneRatio > 0 && pruneRatio == Math.rint(pruneRatio);

this is more reliable for float numbers, otherwise there is a chance of false positive

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem correct to replace the floor to rint, from the definition, rint will give a even number if there are two values same close to the input value, I tested with input 3.5, floor result is 3 but rint result is 4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give an example of false positive?

}
}

switch (pruneType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, can we use map instead of switch?

@zhichao-aws
Copy link
Member Author

@martin-gaievski Thanks for the comments. We didn't create feature branch because there is no other contributors working on this and we regard the PR branch as feature branch.

I'm on PTO this week, will follow the app sec issue and solve the comments next week.


switch (pruneType) {
case TOP_K:
return pruneRatio > 0 && pruneRatio == Math.floor(pruneRatio);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem correct to replace the floor to rint, from the definition, rint will give a even number if there are two values same close to the input value, I tested with input 3.5, floor result is 3 but rint result is 4.

* @param pruneType The type of prune strategy
* @throws IllegalArgumentException if prune type is null
*/
public static String getValidPruneRatioDescription(PruneType pruneType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] this can be refactored to a static map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the discussion with Martin at above

);
} else {
// if we don't have prune type, then prune ratio field must not have value
if (config.containsKey(PruneUtils.PRUNE_RATIO_FIELD)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants