[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497

tiagoshin · 2023-11-21T20:09:22Z

What is the bug?

I'm using Hybrid search in Opensearch version 2.11, and I'm getting the following error in some queries:

{
    "error": {
        "root_cause": [
            {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "index-name",
                "node": "2aLNPmEjQ8OYCuCFyEyI-Q",
                "reason": {
                    "type": "index_out_of_bounds_exception",
                    "reason": null
                }
            }
        ],
        "caused_by": {
            "type": "index_out_of_bounds_exception",
            "reason": null,
            "caused_by": {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        }
    },
    "status": 500
}

I get these logs:

2023-11-21 16:53:16 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:706) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:745) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:503) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TransportService$6.handleException(TransportService.java:903) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1526) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1640) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1614) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:80) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:72) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-11-21 16:53:16     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-11-21 16:53:16     at java.lang.Thread.run(Thread.java:833) [?:?]
2023-11-21 16:53:16 Caused by: org.opensearch.OpenSearchException$3
2023-11-21 16:53:16     at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     ... 23 more
2023-11-21 16:53:16 Caused by: java.lang.IndexOutOfBoundsException
2023-11-21 16:53:16     at java.nio.Buffer.checkIndex(Buffer.java:743) ~[?:?]
2023-11-21 16:53:16     at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:339) ~[?:?]
2023-11-21 16:53:16     at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:273) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.opensearch.neuralsearch.query.HybridQueryScorer.score(HybridQueryScorer.java:64) ~[?:?]
2023-11-21 16:53:16     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.opensearch.common.lucene.MinimumScoreCollector.collect(MinimumScoreCollector.java:78) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:274) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:322) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:354) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:441) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:425) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:65) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66) ~[?:?]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16     ... 8 more

How can one reproduce the bug?

Honestly, it's very hard to reproduce the bug. As I'm using my company's data, I cannot share it publicly. However, we can work on enabling this privately.

What is the expected behavior?

The expectancy is to not get the error for the hybrid search.

What is your host/environment?

MacOS Ventura 13.3.1, I'm running on Docker compose.

Do you have any additional context?

When I search on the exact same index for semantic search or lexical search, it works properly. It only happens for Hybrid search.
I observe a pattern that queries with more than one word tend to be more likely to have this error than simple queries. Queries that failed are like "horror movies", "teen mom", "news radio".
However, I observed that when I changed the combination technique, some queries started working, and other queries started failing.
I also observed that when I changed the index data, some queries started working, and other queries started failing.
However, for the same data and same settings, results are idempotent.

The text was updated successfully, but these errors were encountered:

navneet1v · 2023-11-21T21:57:18Z

@tiagoshin can share the query which you are using?

navneet1v · 2023-11-21T22:11:44Z

I can see the exception is coming from this: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java#L64

@tiagoshin please share your query skeleton, so that it can better help us debug the issue here.

tiagoshin · 2023-11-22T14:09:11Z

Thank you @navneet1v, I shared the query skeleton with David Fowler from AWS customer support, did you receive the query?

navneet1v · 2023-11-28T07:42:24Z

@tiagoshin Looking at logs which are shared, I can see that HybridQueryPhaseSearcher which is responsible for running the query is not invoked. This let me believe that either the hybrid query clause was not the top level clause, or there are some nested fields in the index which lead to wrapping of hybrid query clause with other query clauses(This is OpenSearch default behavior).

We are already working on a fix for nested query clauses, as part of this github issue: #466.

tiagoshin · 2023-11-28T19:39:24Z

Hi @navneet1v, I see the HybridQueryPhaseSearcher invoked in the following line, isn't it?

2023-11-21 16:53:16 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66) ~[?:?]

navneet1v · 2023-11-28T20:00:33Z

@tiagoshin if you look at the code: https://github.com/opensearch-project/neural-search/blob/2.11/src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java#L66

Line 66 will hit if the query is not the top level query is not hybrid query.

tiagoshin · 2023-11-29T15:18:18Z

That makes sense, thank you @navneet1v!

martin-gaievski · 2023-12-13T23:35:18Z

We have pushed a code change that should fix this issue, please check details in this issue comment: #466 (comment)

Lemmmy · 2023-12-17T23:37:49Z

I'm getting a similar, but different exception, on OS 2.11.1 (6b1986e964d440be9137eba1413015c31c5a7752):

Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
        at org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:100) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
        at org.opensearch.neuralsearch.query.HybridQueryScorer.initializeSubScorersPQ(HybridQueryScorer.java:146) ~[?:?]
        at org.opensearch.neuralsearch.query.HybridQueryScorer.<init>(HybridQueryScorer.java:47) ~[?:?]
        at org.opensearch.neuralsearch.query.HybridQueryWeight.scorer(HybridQueryWeight.java:91) ~[?:?]
        at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
        at org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:374) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:319) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.1.jar:2.11.1]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
        at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:104) ~[?:?]
        at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:64) ~[?:?]
        at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.1.jar:2.11.1]
        ... 8 more

Full exception: aioobe.txt

Unfortunately I'm not familiar enough with the subject matter to know if this is the same exception or if it has been patched. I get this error more reproducibly on my single-node cluster with only 8800 documents and the following search pipeline and query:

{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.6,
              0.3,
              0.1
            ]
          }
        }
      }
    }
  ]
}

Query:

{
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "_source": {
    "exclude": [
      "text_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match_phrase": {
            "text": {
              "query": "foo"
            }
          }
        },
        {
          "match": {
            "text": {
              "query": "foo"
            }
          }
        },
        {
          "neural": {
            "text_embedding": {
              "query_text": "foo",
              "model_id": "--------",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

I have narrowed down the issue to occurring when one or more of the sub-queries return effectively 0 results after normalizastion. That is - the scores are so low after normalization that they are completely discarded. If I remove two of the sub-queries and disable the search pipeline, the query works. Or if I make a more specific query where the sub-queries return a similar number of results, the query also works.

I'm happy to provide more information if needed, or make a new issue if it's not the same one as this/#466. I'm running in Docker, so not quite sure how to test the RC build from that thread.

Edit: also tried on 2.12.0, still happening. Is this new issue material?

navneet1v · 2023-12-19T03:31:59Z

Edit: also tried on 2.12.0, still happening. Is this new issue material?

@Lemmmy so what you are saying that you tried on the tar provided here in this comment: #466 (comment) and it is still not working.

cc: @martin-gaievski

navneet1v · 2023-12-19T06:41:07Z

I'm running in Docker, so not quite sure how to test the RC build from that thread.

@Lemmmy the CIs of Opensearch publishes the builds everyday in Opensearch staging repo of Docker: https://hub.docker.com/r/opensearchstaging/opensearch/tags

You can use this: docker pull opensearchstaging/opensearch:2.12.0 to pull the 2.12.0 version of opensearch and see if the issue is still existing.

navneet1v · 2023-12-19T07:35:40Z

@Lemmmy I did some more deep-dive and I am able to reproduce the issue. I also tested with different queries where one query clause doesn't yield any result. That use case is working perfectly.

But I able to figure out the root cause of the exception you are getting. Here are the steps to reproduce:

Setup

PUT example-index
{
  "settings": {
    "index": {
      "knn": true,
      "number_of_shards": 1,
      "number_of_replicas": 0
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "my_vector": {
        "type": "knn_vector",
        "dimension": 1,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      },
      "integer": {
        "type": "integer"
      }
    }
  }
}


PUT example-index/_bulk?refresh
{"index":{"_id":"1"}}
{"text": "neural","my_vector": [5], "integer": 1 }
{"index":{"_id":"2"}}
{"text": "neural neural","my_vector": [4], "integer": 2 }
{"index":{"_id":"3"}}
{"text": "neural neural neural","my_vector": [3], "integer": 3 }
{"index":{"_id":"4"}}
{"text": "neural neural neural neural", "integer": 4 }
{"index":{"_id":"5"}}
{"my_vector": [0], "integer": 5 }


PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        }
      }
    }
  ]
}


# Search Query
POST example-index/_search?search_pipeline=nlp-search-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "term": {
            "text": "neural"
          }
        },
        {
          "term": {
            "text": "neural"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [
                3
              ],
              "k": 3
            }
          }
        }
      ]
    }
  },
  "size": 3
}

Output of Search

{
	"error": {
		"root_cause": [
			{
				"type": "array_index_out_of_bounds_exception",
				"reason": "Index 2 out of bounds for length 2"
			}
		],
		"type": "search_phase_execution_exception",
		"reason": "all shards failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [
			{
				"shard": 0,
				"index": "example-index",
				"node": "roL2TjVsTdex976hXKl9jg",
				"reason": {
					"type": "array_index_out_of_bounds_exception",
					"reason": "Index 2 out of bounds for length 2"
				}
			}
		],
		"caused_by": {
			"type": "array_index_out_of_bounds_exception",
			"reason": "Index 2 out of bounds for length 2",
			"caused_by": {
				"type": "array_index_out_of_bounds_exception",
				"reason": "Index 2 out of bounds for length 2"
			}
		}
	},
	"status": 500
}

Stacktrace

Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
	at org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:100) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.neuralsearch.query.HybridQueryScorer.initializeSubScorersPQ(HybridQueryScorer.java:146) ~[?:?]
	at org.opensearch.neuralsearch.query.HybridQueryScorer.<init>(HybridQueryScorer.java:47) ~[?:?]
	at org.opensearch.neuralsearch.query.HybridQueryWeight.scorer(HybridQueryWeight.java:91) ~[?:?]
	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:374) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:319) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:104) ~[?:?]
	at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:64) ~[?:?]
	at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
	... 8 more

Root Cause

So, what happening here is if we look at the queries provided in the hybrid clause, I have deliberately put my 2 text search queries exactly same.

        {
          "term": {
            "text": "neural"
          }
        }

We create a map of Query to the index(key being the query object) here and use that map here to create PQ and to assign the scorers created for each query. Because both the text queries are same, the map we are creating instead of having size 3(as we have 3 queries) it is getting created with size 2. Which is leading to the exception.

Now, in production I don't expect users to provide two exactly same queries. But this is a bug.

Please let me know if removing the duplicate queries solves your issue.

Proposed Solution

We should go ahead and throw out an exception with proper message to the user that the queries defined have duplicates in it. @Lemmmy Please let me know your thoughts on this.

cc: @martin-gaievski

navneet1v · 2023-12-19T18:06:16Z

@tiagoshin I some deep-dive here: #497 (comment) can you check on your side for you also this was the issue? if not can you provide the query skeleton so that I can make sure that all bugs provided in this issue are resolved. I understand that your query contained nested fields which we have already fixed for 2.12. But is there any other issue that you are facing please do comment, so that it can be fixed in 2.12

Lemmmy · 2023-12-19T18:29:27Z

Thanks for the quick investigation! To clarify, am I supposed to avoid combining match and neural with the same query? As in, this isn't okay? (from the docs):

"query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "Hi world"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "Hi world",
              "model_id": "aVeif4oB5Vm0Tdw8zYO2",
              "k": 5
            }
          }
        }
      ]
    }
  }

Or is it just because of my use of both match_phrase and match?

When changing this line:

neural-search/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java

Line 140 in 5daddfd

DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());

To:

-DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());
+DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(subScorers.size());

The query I provided in #497 (comment) no longer errors and the results look roughly as I'd expect.

navneet1v · 2023-12-19T19:27:41Z

Thanks for the quick investigation! To clarify, am I supposed to avoid combining match and neural with the same query? As in, this isn't okay? (from the docs):

"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "Hi world"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
}
Or is it just because of my use of both match_phrase and match?

This is okay..

But in your case:

{
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "_source": {
    "exclude": [
      "text_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match_phrase": {
            "text": {
              "query": "foo"
            }
          }
        },
        {
          "match": {
            "text": {
              "query": "foo"
            }
          }
        },
        {
          "neural": {
            "text_embedding": {
              "query_text": "foo",
              "model_id": "--------",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

The match_phrase and match are actually boiling down to same queries and hence the issue was happening.

Lemmmy · 2023-12-19T19:34:38Z

Ah, that makes a lot more sense, I will fix that then. Thanks for all your help.

navneet1v · 2023-12-19T19:43:15Z

Ah, that makes a lot more sense, I will fix that then. Thanks for all your help.

Sure, I am planning to add an exception signature if we found out queries are same and then throw the exception out from here: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java#L297

like this:

if(queries.size() != new HashSet<>(queries).size()) {
            throw new OpenSearchException("There are duplicates in the query.");
        }

This will ensure that queries are not run, because if we do this change

-DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());
+DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(subScorers.size());

it has some other side effects in the code.

tiagoshin · 2023-12-19T21:30:27Z

Hi @navneet1v, thank you very much for your attention.
I'm testing the build for 2.12.0 from RC build and now I'm getting distinct errors.
For all queries, when I perform hybrid search, I got:

    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "totalHitsThreshold must be less than max integer value"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "test",
                "node": "si1uOQWhRMWsWbFC6kaKjg",
                "reason": {
                    "type": "illegal_argument_exception",
                    "reason": "totalHitsThreshold must be less than max integer value"
                }
            }
        ],
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "totalHitsThreshold must be less than max integer value",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "totalHitsThreshold must be less than max integer value"
            }
        }
    },
    "status": 400
}

So I increased track_total_hits to 50,000 and it worked for some queries. For other queries, I got the following error:

    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "The phase has failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "illegal_state_exception",
            "reason": "Score normalization processor cannot produce final query result"
        }
    },
    "status": 500
}

Here are the logs:

2023-12-19 18:25:36 opensearch_semantic1  | org.opensearch.action.search.SearchPhaseExecutionException: The phase has failed
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:622) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:607) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:590) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:373) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:132) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:590) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:161) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:292) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:44) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:99) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:70) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:746) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.transport.TransportService$9.handleResponse(TransportService.java:1693) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1475) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1558) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1538) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:72) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:62) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:45) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2023-12-19 18:25:36 opensearch_semantic1  | Caused by: org.opensearch.search.pipeline.SearchPipelineProcessingException: java.lang.IllegalStateException: Score normalization processor cannot produce final query result
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:295) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:47) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:755) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:620) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     ... 31 more
2023-12-19 18:25:36 opensearch_semantic1  | Caused by: java.lang.IllegalStateException: Score normalization processor cannot produce final query result
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.getSearchHits(NormalizationProcessorWorkflow.java:177) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.updateOriginalFetchResults(NormalizationProcessorWorkflow.java:142) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.execute(NormalizationProcessorWorkflow.java:73) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.neuralsearch.processor.NormalizationProcessor.process(NormalizationProcessor.java:62) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.search.pipeline.SearchPhaseResultsProcessor.process(SearchPhaseResultsProcessor.java:48) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:276) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:47) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:755) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:620) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1  |     ... 31 more

navneet1v · 2023-12-20T18:36:01Z

@tiagoshin can you share the query skeleton with me so that I can reproduce the issue. BTW are you setting track_total_hits in the query?

tiagoshin · 2023-12-20T20:55:22Z

@navneet1v I shared the query and artifacts with David Fowler. Could you please get them with him?

tiagoshin · 2023-12-21T17:44:44Z

@navneet1v I got the same issue that I reported before about the IndexOutOfBoundsException on version 2.12.0 when increasing the ef_construction parameter to 1024. Before that, the exact same query with the same data and model was working for a particular query. Once I increased the ef_construction parameter, I got the following error:

{
    "error": {
        "root_cause": [
            {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "pluto-test2",
                "node": "j4rUlY77ToenCAVXWKUnxA",
                "reason": {
                    "type": "index_out_of_bounds_exception",
                    "reason": null
                }
            }
        ],
        "caused_by": {
            "type": "index_out_of_bounds_exception",
            "reason": null,
            "caused_by": {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        }
    },
    "status": 500
}

On the logs I see:

2023-12-21 14:39:34 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:757) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:511) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1699) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1485) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1599) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1573) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:73) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2023-12-21 14:39:34     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2023-12-21 14:39:34     at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2023-12-21 14:39:34 Caused by: org.opensearch.OpenSearchException$3
2023-12-21 14:39:34     at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     ... 23 more
2023-12-21 14:39:34 Caused by: java.lang.IndexOutOfBoundsException
2023-12-21 14:39:34     at java.base/java.nio.Buffer.checkIndex(Buffer.java:687) ~[?:?]
2023-12-21 14:39:34     at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269) ~[?:?]
2023-12-21 14:39:34     at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.TermScorer.score(TermScorer.java:86) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:266) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:117) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:65) ~[?:?]
2023-12-21 14:39:34     at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:277) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:326) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:549) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:219) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:72) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:547) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:611) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:580) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     ... 8 more

However, if I decrease the ef_construction, the queries that were getting the error reported here, kept having the same error. So decreasing ef_construction doesn't solve other issues, but increasing it may cause this error.

navneet1v · 2023-12-21T18:20:47Z

@navneet1v I got the same issue that I reported before about the IndexOutOfBoundsException on version 2.12.0 when increasing the ef_construction parameter to 1024. Before that, the exact same query with the same data and model was working for a particular query. Once I increased the ef_construction parameter, I got the following error:

Block (35 lines)

{
    "error": {
        "root_cause": [
            {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "pluto-test2",
                "node": "j4rUlY77ToenCAVXWKUnxA",
                "reason": {
                    "type": "index_out_of_bounds_exception",
                    "reason": null
                }
            }
        ],
        "caused_by": {
            "type": "index_out_of_bounds_exception",
            "reason": null,
            "caused_by": {
                "type": "index_out_of_bounds_exception",
                "reason": null
            }
        }
    },
    "status": 500
}

On the logs I see:

Block (69 lines)

2023-12-21 14:39:34 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:757) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:511) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1699) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1485) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1599) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1573) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:73) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2023-12-21 14:39:34     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2023-12-21 14:39:34     at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2023-12-21 14:39:34 Caused by: org.opensearch.OpenSearchException$3
2023-12-21 14:39:34     at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     ... 23 more
2023-12-21 14:39:34 Caused by: java.lang.IndexOutOfBoundsException
2023-12-21 14:39:34     at java.base/java.nio.Buffer.checkIndex(Buffer.java:687) ~[?:?]
2023-12-21 14:39:34     at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269) ~[?:?]
2023-12-21 14:39:34     at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.TermScorer.score(TermScorer.java:86) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:266) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:117) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:65) ~[?:?]
2023-12-21 14:39:34     at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:277) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:326) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:549) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:219) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:72) ~[?:?]
2023-12-21 14:39:34     at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:547) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:611) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:580) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34     ... 8 more

However, if I decrease the ef_construction, the queries that were getting the error reported here, kept having the same error. So decreasing ef_construction doesn't solve other issues, but increasing it may cause this error.

The IndexOutOfBoundsException exception fix is not there in 2.12, the 2.12 contains only the fix for nestedQueries. If you look at my RCA done here: #497 (comment) it provides the info that if you have 2 queries which are same then in that case the issue will happen. So, check your array of hybrid queries and see if there are duplicates. If yes remove them and this can be a short fix from your side. Meanwhile we deicide how to handle the duplicate queries.

navneet1v · 2023-12-22T17:51:01Z

@navneet1v you're right, this issue in particular is caused by system memory constraints. I increased the JVM heap size and it worked, thank you! However it worth noting that the other issue keeps happening

Thanks for the response. I am working on that issue. Doing some more validations before I put a Root cause and the fix for the issue.

navneet1v · 2023-12-22T18:57:54Z

So I was able to get to the rootcause of the issue mentioned here(#497 (comment)):

Hi @navneet1v, thank you very much for your attention.
I'm testing the build for 2.12.0 from RC build and now I'm getting distinct errors.
For all queries, when I perform hybrid search, I got:

"error": {
    "root_cause": [
        {
            "type": "illegal_argument_exception",
            "reason": "totalHitsThreshold must be less than max integer value"
        }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
        {
            "shard": 0,
            "index": "test",
            "node": "si1uOQWhRMWsWbFC6kaKjg",
            "reason": {
                "type": "illegal_argument_exception",
                "reason": "totalHitsThreshold must be less than max integer value"
            }
        }
    ],
    "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "totalHitsThreshold must be less than max integer value",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "totalHitsThreshold must be less than max integer value"
        }
    }
},
"status": 400

}

So the first issue where we are seeing totalHitsThreshold must be less than max integer value is coming from this line:

neural-search/src/main/java/org/opensearch/neuralsearch/search/HitsThresholdChecker.java

Lines 27 to 29 in 63fe67f

    
           if (totalHitsThreshold == Integer.MAX_VALUE) { 
        
               throw new IllegalArgumentException(String.format(Locale.ROOT, "totalHitsThreshold must be less than max integer value")); 
        
           }

This case happen when we are adding track_total_hits: true in search request, rather than any integer value. I think we can remove the check. When track_total_hits: true the value of total hits become Integer.MAX_VALUE and hence the check fails. I check with other query clauses track_total_hits: true works. I will go ahead and fix this.

For the second issue where was track_total_hits: 50000, @tiagoshin can you provide me this info:

How many shards you were using?
How many data nodes we are using?
How many total documents were there in the index?

tiagoshin · 2023-12-22T19:05:45Z

@navneet1v

I'm using two shards:

index                   shard prirep state    docs   store ip         node
.plugins-ml-model-group 0     p      STARTED     1  12.5kb 172.18.0.3 node-1
.plugins-ml-model-group 0     r      STARTED     1   5.5kb 172.18.0.2 node-2
.plugins-ml-config      0     p      STARTED     1   3.9kb 172.18.0.3 node-1
.plugins-ml-config      0     r      STARTED     1   3.9kb 172.18.0.2 node-2
.plugins-ml-model       0     p      STARTED    11 115.8mb 172.18.0.3 node-1
.plugins-ml-model       0     r      STARTED    11 115.9mb 172.18.0.2 node-2
.plugins-ml-task        0     p      STARTED     2  44.4kb 172.18.0.3 node-1
.plugins-ml-task        0     r      STARTED     2  36.8kb 172.18.0.2 node-2
test                    0     p      STARTED 75997   190mb 172.18.0.3 node-1
test                    0     r      STARTED 75997 191.9mb 172.18.0.2 node-2

I'm using 2 data nodes
I have 75997 documents:

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .plugins-ml-model-group 8radM4MFTvSD0ml76PrneA   1   1          1            0       18kb         12.5kb
green  open   .plugins-ml-config      4KiNinC1QTCmEwmOwK-omw   1   1          1            0      7.8kb          3.9kb
green  open   .plugins-ml-model       RYKMfd3KTj2OfiK4madWyw   1   1         11            0    231.8mb        115.8mb
green  open   .plugins-ml-task        E_rRWs4vSuulZ6U6n2FY9g   1   1          2            0     81.2kb         44.4kb
green  open   test                    GSRltbzPQVeJ5h7MoxYSdg   1   1      75997        21976      382mb          190mb

navneet1v · 2023-12-22T19:12:24Z

@navneet1v

I'm using two shards:

Block (12 lines)

index                   shard prirep state    docs   store ip         node
.plugins-ml-model-group 0     p      STARTED     1  12.5kb 172.18.0.3 node-1
.plugins-ml-model-group 0     r      STARTED     1   5.5kb 172.18.0.2 node-2
.plugins-ml-config      0     p      STARTED     1   3.9kb 172.18.0.3 node-1
.plugins-ml-config      0     r      STARTED     1   3.9kb 172.18.0.2 node-2
.plugins-ml-model       0     p      STARTED    11 115.8mb 172.18.0.3 node-1
.plugins-ml-model       0     r      STARTED    11 115.9mb 172.18.0.2 node-2
.plugins-ml-task        0     p      STARTED     2  44.4kb 172.18.0.3 node-1
.plugins-ml-task        0     r      STARTED     2  36.8kb 172.18.0.2 node-2
test                    0     p      STARTED 75997   190mb 172.18.0.3 node-1
test                    0     r      STARTED 75997 191.9mb 172.18.0.2 node-2

I'm using 2 data nodes
I have 75997 documents:

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .plugins-ml-model-group 8radM4MFTvSD0ml76PrneA   1   1          1            0       18kb         12.5kb
green  open   .plugins-ml-config      4KiNinC1QTCmEwmOwK-omw   1   1          1            0      7.8kb          3.9kb
green  open   .plugins-ml-model       RYKMfd3KTj2OfiK4madWyw   1   1         11            0    231.8mb        115.8mb
green  open   .plugins-ml-task        E_rRWs4vSuulZ6U6n2FY9g   1   1          2            0     81.2kb         44.4kb
green  open   test                    GSRltbzPQVeJ5h7MoxYSdg   1   1      75997        21976      382mb          190mb

Actually you are using 1 shard. the other shard is a replica of the first shard. But thanks for this information. The code path which is resulting in this issue that you are getting when you set track_total_hits: 50000 can only come if primary shards are 1.

Just for resolving the issue for now, can you try with more than 1 primary shards. and see if you still face the issue when track_total_hits: 50000. I am hoping you won't

tiagoshin · 2023-12-22T20:06:23Z

@navneet1v It worked when increasing shards to 2, thank you very much!
What's your advice about this if I increase the number of replicas as well?

navneet1v · 2023-12-22T22:18:39Z

@navneet1v It worked when increasing shards to 2, thank you very much! What's your advice about this if I increase the number of replicas as well?

Replicas will have no impact. You can keep it whatever you want.

Just to put it 1 more time, i am still going to do deep-dive to fix the issue with 1 shard too. But for now happy to know you are unblocked

* Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <gaievski@amazon.com>

…-project#524) * Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 585fbbe)

…-project#524) * Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 585fbbe) Signed-off-by: Martin Gaievski <gaievski@amazon.com>

* Allow multiple identical sub-queries in hybrid query, removed validation for total hits (cherry picked from commit 585fbbe) Signed-off-by: Martin Gaievski <gaievski@amazon.com>

tiagoshin · 2024-01-24T23:06:47Z

Hi @navneet1v!
David Fowler provided me a patch to test the version with the fix on the AWS OpenSearch cluster.
While during tests, I verified that I keep having issues even when I use 2 shards and track_total_hits: 50000.
I identified 3 issues:

For some queries, I'm getting error index_out_of_bounds_exception for both shards;
For some other queries, when I get successful responses, I identified that:
- The results are not reproducible; they change when I try to hit again. For comparison, query A or query B isolated in the same index is providing me reproducible results, which makes me believe that the issue is related to hybrid search;
- The results aren't a combination of query A and query B results according to the hybrid search definitions. For instance, there are some results that don't appear as results for query A or query B, but appear as first results for hybrid search;

I shared with David Fowler the artifacts for reproduction of the issue

martin-gaievski · 2024-01-26T02:13:32Z

Hi @tiagoshin
For queries where you're getting index_out_of_bounds_exception is there a server log available, similar to what you've provided in initial PR description? If yes, can you please share it. This exception is pretty generic and it's hard to tell just from its name where it's coming from.

for the issue 2 I think what's happening is that some results are having exactly same score after normalization and when combined some of them may be pushed down out of the final result list. As this depends on order of execution of individual sub-queries every time result of such re-arrangement will look differently.

for the issue 3 what can happen is that a doc may receive higher combined score if it appears in results of let's say 2 sub-queries rather than in only one sub-query, even if in that one result this doc is high. For example, let's say we have sub-query A and B, and each return following results: A = [doc1: 0.7, doc2: 0.6, doc3: 0.5] and B = [doc4: 0.7, doc5: 0.6, doc3: 0.5]. If we use arithmetic mean for combination, then final result will look like: [doc3: 0.5, doc1: 0.35, doc4: 0.35].

tiagoshin · 2024-01-26T14:39:41Z

@martin-gaievski Here are the logs:
[2024-01-24T22:47:43,188][WARN ][r.suppressed ] [2171808a1c467ff0235c64b24d841aac] path: __PATH__ params: {index=index}Failed to execute phase [query], all shards failed; shardFailures {[Vq9bCKylQyiRxmS78Azwtg][index][0]: RemoteTransportException[[c8a686a3b4ea2af6011c66cfb2dffbcd][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: NotSerializableExceptionWrapper[index_out_of_bounds_exception: null]; }{[7HN8cHU6TPq_jl3l1cwWTA][index][1]: RemoteTransportException[[8d30fa4f7db7a4545505fc4ebc06002c][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: NotSerializableExceptionWrapper[index_out_of_bounds_exception: 634]; } at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:716) at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:381) at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:755) at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:513) at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:303) at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:753) at org.opensearch.transport.TransportService$6.handleException(TransportService.java:903) at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1526) at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:438) at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:412) at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:436) at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:428) at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:166) at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:123) at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1471) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1334) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1383) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at __PATH__(Thread.java:833)Caused by: NotSerializableExceptionWrapper[index_out_of_bounds_exception: null] at java.nio.Buffer.checkIndex(Buffer.java:743) at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:332) at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.WANDScorer.score(WANDScorer.java:527) at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:273) at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:137) at org.opensearch.neuralsearch.search.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:65) at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:274) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254) at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:322) at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:219) at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:72) at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:833)

tiagoshin · 2024-01-26T15:37:26Z

@martin-gaievski
I understand your reasoning regarding topic 2 or 3, but I can ensure that's not the case.
Let me give some examples:
Setup:
I'm using normalizer l2 and combiner arithmetic_mean. Weights are 0.5 and 0.5. Results for Query A and Query B are always the same, so I'll focus on getting some results from the hybrid query and tell you where they were on queries A and B.

1st hybrid search run
Looking for document X

Query A: Score: 53.25505. Position: 13th with the same score as 12th
Query B: Score: ?. Position: None. Doesn't appear on result
Hybrid search query: Score: 0.28119355. Position: 1st

2nd hybrid search run
Looking for document Y

Query A: Score: 41.551655. Position: 19th
Query B: Score: ?. Position: None. Doesn't appear on result
Hybrid search query: Score: 0.16391976. Position: 3rd

3rd hybrid search run
Looking for document Z

Query A: Score: ?. Position: None. Doesn't appear on result
Query B: Score: ?. Position: None. Doesn't appear on result
Hybrid search query: 0.22679898. Position: 1st

In conclusion:

The hybrid results are very distinct between runs;
The calculations in most cases aren't correct. It's not hard to find many more documents that could be ranked higher than a result that appears in 1st place in a hybrid search.

I'll send you all the queries and results privately, so you can check yourself.

Workaround for a hybrid query bug in OpenSearch - opensearch-project/neural-search#497

navneet1v · 2024-07-29T19:13:09Z

@martin-gaievski can we close this issue as the bug is resolved.

martin-gaievski · 2024-08-08T18:14:25Z

yes, code wise we took care of the problem in #524

tiagoshin added bug Something isn't working untriaged labels Nov 21, 2023

navneet1v removed the untriaged label Nov 21, 2023

navneet1v assigned martin-gaievski Nov 21, 2023

vamshin added this to Vector Search RoadMap Dec 14, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Dec 14, 2023

vamshin moved this from Backlog to 2.12.0 in Vector Search RoadMap Dec 14, 2023

vamshin added the v2.12.0 Issues targeting release v2.12.0 label Dec 14, 2023

martin-gaievski mentioned this issue Dec 27, 2023

Fixing multiple issues reported in #497 #524

Merged

3 tasks

martin-gaievski added a commit that referenced this issue Dec 29, 2023

Fixing multiple issues reported in #497 (#524)

585fbbe

* Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <gaievski@amazon.com>

ryanbogan added v2.13.0 and removed v2.12.0 Issues targeting release v2.12.0 labels Feb 22, 2024

vibrantvarun added v2.14.0 and removed v2.13.0 labels Mar 19, 2024

tiagoshin mentioned this issue Apr 5, 2024

[BUG] Neural search: 4xx error ingesting data with Sagemaker external model opensearch-project/ml-commons#2249

Closed

jmazanec15 added v2.15.0 and removed v2.14.0 labels Apr 30, 2024

jmazanec15 moved this from 2.12.0 to 2.15.0 in Vector Search RoadMap Apr 30, 2024

vamshin moved this from 2.15.0 to Backlog in Vector Search RoadMap Jul 25, 2024

marcus-bcl added a commit to ministryofjustice/probation-offender-search that referenced this issue Jul 29, 2024

Set trackTotalHits threshold to 5000

154fccf

Workaround for a hybrid query bug in OpenSearch - opensearch-project/neural-search#497

martin-gaievski closed this as completed Aug 8, 2024

github-project-automation bot moved this from Backlog to ✅ Done in Vector Search RoadMap Aug 8, 2024

max-shyman mentioned this issue Nov 8, 2024

[BUG] 2.15 Hybrid Search With query_text and radial vector search producing array out of bounds #973

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497

[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497

tiagoshin commented Nov 21, 2023 •

edited

Loading

navneet1v commented Nov 21, 2023

navneet1v commented Nov 21, 2023

tiagoshin commented Nov 22, 2023

navneet1v commented Nov 28, 2023

tiagoshin commented Nov 28, 2023

navneet1v commented Nov 28, 2023

tiagoshin commented Nov 29, 2023

martin-gaievski commented Dec 13, 2023

Lemmmy commented Dec 17, 2023 •

edited

Loading

navneet1v commented Dec 19, 2023

navneet1v commented Dec 19, 2023

navneet1v commented Dec 19, 2023 •

edited

Loading

navneet1v commented Dec 19, 2023

Lemmmy commented Dec 19, 2023 •

edited

Loading

navneet1v commented Dec 19, 2023

Lemmmy commented Dec 19, 2023

navneet1v commented Dec 19, 2023 •

edited

Loading

tiagoshin commented Dec 19, 2023 •

edited

Loading

navneet1v commented Dec 20, 2023

tiagoshin commented Dec 20, 2023

tiagoshin commented Dec 21, 2023

navneet1v commented Dec 21, 2023

navneet1v commented Dec 22, 2023

navneet1v commented Dec 22, 2023 •

edited

Loading

tiagoshin commented Dec 22, 2023

navneet1v commented Dec 22, 2023

tiagoshin commented Dec 22, 2023

navneet1v commented Dec 22, 2023

tiagoshin commented Jan 24, 2024

martin-gaievski commented Jan 26, 2024

tiagoshin commented Jan 26, 2024

tiagoshin commented Jan 26, 2024 •

edited

Loading

navneet1v commented Jul 29, 2024

martin-gaievski commented Aug 8, 2024

[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497

[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497

Comments

tiagoshin commented Nov 21, 2023 • edited Loading

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any additional context?

navneet1v commented Nov 21, 2023

navneet1v commented Nov 21, 2023

tiagoshin commented Nov 22, 2023

navneet1v commented Nov 28, 2023

tiagoshin commented Nov 28, 2023

navneet1v commented Nov 28, 2023

tiagoshin commented Nov 29, 2023

martin-gaievski commented Dec 13, 2023

Lemmmy commented Dec 17, 2023 • edited Loading

navneet1v commented Dec 19, 2023

navneet1v commented Dec 19, 2023

navneet1v commented Dec 19, 2023 • edited Loading

Setup

Root Cause

Proposed Solution

navneet1v commented Dec 19, 2023

Lemmmy commented Dec 19, 2023 • edited Loading

navneet1v commented Dec 19, 2023

Lemmmy commented Dec 19, 2023

navneet1v commented Dec 19, 2023 • edited Loading

tiagoshin commented Dec 19, 2023 • edited Loading

navneet1v commented Dec 20, 2023

tiagoshin commented Dec 20, 2023

tiagoshin commented Dec 21, 2023

navneet1v commented Dec 21, 2023

navneet1v commented Dec 22, 2023

navneet1v commented Dec 22, 2023 • edited Loading

tiagoshin commented Dec 22, 2023

navneet1v commented Dec 22, 2023

tiagoshin commented Dec 22, 2023

navneet1v commented Dec 22, 2023

tiagoshin commented Jan 24, 2024

martin-gaievski commented Jan 26, 2024

tiagoshin commented Jan 26, 2024

tiagoshin commented Jan 26, 2024 • edited Loading

navneet1v commented Jul 29, 2024

martin-gaievski commented Aug 8, 2024

tiagoshin commented Nov 21, 2023 •

edited

Loading

Lemmmy commented Dec 17, 2023 •

edited

Loading

navneet1v commented Dec 19, 2023 •

edited

Loading

Lemmmy commented Dec 19, 2023 •

edited

Loading

navneet1v commented Dec 19, 2023 •

edited

Loading

tiagoshin commented Dec 19, 2023 •

edited

Loading

navneet1v commented Dec 22, 2023 •

edited

Loading

tiagoshin commented Jan 26, 2024 •

edited

Loading