Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Conversation

chenqi0805
Copy link
Contributor

@chenqi0805 chenqi0805 commented Jul 7, 2020

Issue #, if available:
#114 #171

Description of changes:
This PR supports the new space type negdotprod. The main issue in supporting the new space type is that nmslib only supports l2 and cosinesimil for optimized index: https://github.com/nmslib/nmslib/blob/32e6e69a574b678da0d37bece8dbe6b1b250b660/python_bindings/README.md#saving-indexes-and-data

while for all other space types Object vector data has to be saved and loaded explicitly from .dat file. This PR deals with those new .dat files. Changes include

  1. JNI code
  2. Write footer for .dat file
  3. Compound file format
  4. Change scoring function into 1-sigmoid(raw KNN similarity score) in KNNWeight to cover the whole real axis.

Note:

This PR should be reviewed after #160 since it is based off the nmslib latest version.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@chenqi0805 chenqi0805 requested review from vamshin and jmazanec15 July 7, 2020 23:03
@chenqi0805
Copy link
Contributor Author

@jmazanec15 @vamshin The current state of the PR treats indices for all supported space types as non-optimized indices. I realized that this works OK for new data, but has significant backward compatibility issue with old doc saved under optimized indices for l2 and cosinesimil, e.g.

POST /yourindex/_search
{
    "size" : 10,
    "query": {
        "knn": {
            "my_vector1": {
                "vector": [1, 25],
                "k": 2
            }
        }
    }
}
{
  "error" : {
    "root_cause" : [
      {
        "type" : "runtime_exception",
        "reason" : "java.util.concurrent.ExecutionException: java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "yourindex",
        "node" : "UWTvby-lTSWRsQcPoeFv-g",
        "reason" : {
          "type" : "runtime_exception",
          "reason" : "java.util.concurrent.ExecutionException: java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading",
          "caused_by" : {
            "type" : "execution_exception",
            "reason" : "java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading",
            "caused_by" : {
              "type" : "exception",
              "reason" : "Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading"
            }
          }
        }
      }
    ],
    "caused_by" : {
      "type" : "runtime_exception",
      "reason" : "java.util.concurrent.ExecutionException: java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading",
      "caused_by" : {
        "type" : "runtime_exception",
        "reason" : "java.util.concurrent.ExecutionException: java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading",
        "caused_by" : {
          "type" : "execution_exception",
          "reason" : "java.lang.Exception: Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading",
          "caused_by" : {
            "type" : "exception",
            "reason" : "Check failed: Cannot open file '/Users/qchea/Downloads/elasticsearch-7.8.0/data/nodes/0/indices/YLGjlb_7TA-wzqJFeJ6Urg/0/index/_0_1736_my_vector1.hnswc.dat' for reading"
          }
        }
      }
    }
  },
  "status" : 500
}

To resolve that, I think one option is to treat space types that supports optimized index separately from those do not support optimized index. But before going further, I would like to hear from your comments and suggestions.

@chenqi0805
Copy link
Contributor Author

chenqi0805 commented Jul 11, 2020

Finished separation between optimized(l2, cosinesimil) and non-optimized(negdotprod, etc) space types for KNN index.

@chenqi0805 chenqi0805 changed the title [Backward Compatibility]FEAT: support dot product similarity measure FEAT: support dot product similarity measure(Backward Compatible) Jul 12, 2020
…asure

* master:
  FIX: Pass -march=x86-64 to build JNI library (opendistro-for-elasticsearch#164)
  FIX: Added resetState for uTs so state does not spill over (opendistro-for-elasticsearch#159)
@jmazanec15 jmazanec15 added the Features New functionality added label Jul 16, 2020
@jmazanec15 jmazanec15 changed the title FEAT: support dot product similarity measure(Backward Compatible) Support dot product similarity measure(Backward Compatible) Jul 16, 2020
@chenqi0805
Copy link
Contributor Author

A failed test case scenario:

public void testQueryIndexWithNonOptimizeSpace() throws IOException {
        Settings settings = Settings.builder()
                .put(getKNNDefaultIndexSettings())
                .put("index.knn.space_type", "negdotprod")
                .put("index.knn.algo_param.ef_search", 300)
                .build();
        createKnnIndex("hisindex", settings, createKnnIndexMapping("my_vector1", 2));

        // add doc
        Float[] vector = {1.0f, 1.0f};
        addKnnDoc("hisindex", "1", "my_vector1", vector);

        // search
        float[] queryVector = {1.5f, 2.5f}; // vector to be queried
        int k = 1; //  nearest 1 neighbor
        KNNQueryBuilder knnQueryBuilder = new KNNQueryBuilder("my_vector1", queryVector, k);
        searchKNNIndex("hisindex", knnQueryBuilder, k);
    }

root cause: lucene does not allow negative score in score collector:

=== Standard error of node `node{::integTest-0}` ===
»   ↓ last 40 non error or warning messages from /Users/qchea/IdeaProjects/k-NN/build/testclusters/integTest-0/logs/es.stderr.log ↓
» fatal error in thread [elasticsearch[integTest-0][search][T#1]], exiting
»  java.lang.AssertionError
»       at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:79)
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:242)
»       at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:229)
»       at org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:56)
»       at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
»       at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:212)
»       at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:185)
»       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
»       at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:344)
»       at org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:299)
»       at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:151)
»       at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:361)
»       at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:434)
»       at org.elasticsearch.search.SearchService.access$200(SearchService.java:135)
»       at org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:395)
»       at org.elasticsearch.search.SearchService.lambda$runAsync$0(SearchService.java:411)
»       at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
»       at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695)
»       at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
»       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
»       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
»       at java.base/java.lang.Thread.run(Thread.java:832)

We need to change the following scoring formula for negdotprod:

result -> 1/(1 + result.getScore())

@chenqi0805 chenqi0805 changed the title Support dot product similarity measure(Backward Compatible) [WIP]Support dot product similarity measure(Backward Compatible) Jul 18, 2020
@chenqi0805 chenqi0805 changed the title [WIP]Support dot product similarity measure(Backward Compatible) Support dot product similarity measure and change scoring function (Backward Compatible) Jul 19, 2020
@chenqi0805
Copy link
Contributor Author

Modified scoring function according to discussion in #171 .

@vamshin
Copy link
Member

vamshin commented Jul 22, 2020

Hi @chenqi0805,

Thanks for the awesome work. We will get back to this PR soon.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chenqi0805 , did you get a chance to investigate why they don't support optimized index for negative dot product in NMSLIB?

@chenqi0805
Copy link
Contributor Author

Hey @chenqi0805 , did you get a chance to investigate why they don't support optimized index for negative dot product in NMSLIB?

@jmazanec15 Not yet. I will first check issues in nmslib and then post a question in the nmslib lobby.

@jmazanec15
Copy link
Member

Cool, saw your comment @chenqi0805 . We could look to contribute if possible.

@vamshin vamshin changed the title Support dot product similarity measure and change scoring function (Backward Compatible) [On hold]Support dot product similarity measure and change scoring function (Backward Compatible) Jan 6, 2021
@vamshin vamshin changed the title [On hold]Support dot product similarity measure and change scoring function (Backward Compatible) [On Hold]Support dot product similarity measure and change scoring function (Backward Compatible) Jan 6, 2021
@vamshin vamshin changed the title [On Hold]Support dot product similarity measure and change scoring function (Backward Compatible) [On Hold] Support dot product similarity measure and change scoring function (Backward Compatible) Jan 6, 2021
@vamshin
Copy link
Member

vamshin commented Jan 7, 2021

We now have nmslib supporting optimized version of negative dot product starting version 2.0.8. At this point we would close this PR and come up with new PR which reads optimized version of negative dot product. Thanks for the hard work and contribution @chenqi0805.

@vamshin vamshin closed this Jan 7, 2021
@vamshin vamshin changed the title [On Hold] Support dot product similarity measure and change scoring function (Backward Compatible) Support dot product similarity measure and change scoring function (Backward Compatible) Jan 7, 2021
@jgq2008303393
Copy link

We now have nmslib supporting optimized version of negative dot product starting version 2.0.8. At this point we would close this PR and come up with new PR which reads optimized version of negative dot product. Thanks for the hard work and contribution @chenqi0805.

I would like to ask if there is any progress about the new PR?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Features New functionality added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants