Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: binaryEmbeddingReader can't be null #6

Closed
ltung-cit opened this issue Apr 26, 2018 · 13 comments
Closed

Error: binaryEmbeddingReader can't be null #6

ltung-cit opened this issue Apr 26, 2018 · 13 comments

Comments

@ltung-cit
Copy link

I'm using Elasticsearch as docker container with the binary-vector-scoring plugin installed, but I'm getting an intermittent error when doing search with the following query:

{
  "function_score": {
    "boost": 1,
    "score_mode": "avg",
    "boost_mode": "multiply",
    "min_score": 0,
    "script_score": {
      "script": {
        "source": "binary_vector_score",
        "lang": "knn",
        "params": {
          "cosine": true,
          "field": "image_embedding",
          "vector": "MY_VECTOR_HERE"
        }
      }
    }
  }
}

The search runs ok for a while (first dozen of requests) and then it starts returning the following error:

Caused by: java.lang.IllegalStateException: binaryEmbeddingReader can't be null
elasticsearch    | 	at com.liorkn.elasticsearch.script.VectorScoreScript.setBinaryEmbeddingReader(VectorScoreScript.java:67) ~[?:?]
elasticsearch    | 	at com.liorkn.elasticsearch.service.VectorScoringScriptEngineService$1.getLeafSearchScript(VectorScoringScriptEngineService.java:65) ~[?:?]
elasticsearch    | 	at org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.getLeafScoreFunction(ScriptScoreFunction.java:79) ~[elasticsearch-5.6.0.jar:5.6.0]
elasticsearch    | 	at org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorWeight.functionScorer(FunctionScoreQuery.java:140) ~[elasticsearch-5.6.0.jar:5.6.0]
...

Reindexing all documents is the only way to make the search work again, has anybody faced the same problem?

@lior-k
Copy link
Owner

lior-k commented Apr 26, 2018

this error happens when the field ("image_embedding" in your case) does not exist in all the documents you are searching on.

@ghost
Copy link

ghost commented Apr 27, 2018

Same error.
I used the field "embedding_vector", and it exists in my document I'm searching on.

@ltung-cit
Copy link
Author

Hi @lior-k
The field (image_embedding) also exists in my document.

I have an indice with 10 shards and I realized that when search does return hits, there's a JSON in the response with the property shards:

{
  "successful": 3,
  "failed": 7,
  "skipped": 0,
  "total": 10,
  "failures": [
    {
      "node": "ghr7DWYOSWa4tlvZ4kpsFQ",
      "index": "deckito",
      "reason": {
        "reason": "binaryEmbeddingReader can't be null",
        "type": "illegal_state_exception"
      },
      "shard": 0
    }
  ]
}

When setting shards to a low number (below 3), the error occurs more often.

@nabas
Copy link

nabas commented Apr 27, 2018

I also have the same problem, the document has the field but the problem happens

@lior-k
Copy link
Owner

lior-k commented Apr 27, 2018 via email

@ltung-cit
Copy link
Author

Hi @lior-k

This is my mapping:

{
    "settings": {
        "number_of_shards": 10
    },
    "mappings": {
        "slide": {
            "properties": {
                "deck_id": {
                    "type": "keyword",
                    "index": true
                },                
                "number": {
                    "type": "integer",
                    "index": true
                },
                "image_embedding": {
                    "type": "binary",
                    "doc_values": true
                },
                "text": {
                    "type": "text",
                    "index": true
                }
            }
        },
        "searchResult": {
            "properties": {
                "deck_id": {
                    "type": "keyword",
                    "index": true
                },
                "search_timestamp": {
                    "type": "date",
                    "index": true
                },
            }
        }
    }
}

My query:

{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "boost": 1,
            "score_mode": "avg",
            "boost_mode": "multiply",
            "min_score": 0,
            "script_score": {
              "script": {
                "source": "binary_vector_score",
                "lang": "knn",
                "params": {
                  "cosine": true,
                  "field": "image_embedding",
                  "vector": "MY_VECTOR"
                }
              }
            }
          }
        }
      ]
    }
  }
}

MY_VECTOR is something like [0.20438875, 0.087035105, 0.41949105, ...]

I'm using the Python client to search only documents of type slide, which have the field "image_embedding" in all of them:

result = self.client.search(index='deckito', doc_type='slide', from_=0, size=3, body=query, version=True, _source_include=['deck_id', 'number', 'image_embedding'])

@lior-k
Copy link
Owner

lior-k commented Apr 27, 2018

please do the following query in order to check that all the documents have values in this field.
meaning this query should return 0 documents:

GET <es-url>/<index>/_search
{
    "query": {
        "bool" : {
            "must" : {
                "script" : {
                    "script" : {
                        "inline": "doc.image_embedding == null || doc.image_embedding.value == null || doc.image_embedding.value == ''",
                        "lang": "painless"
                     }
                }
            }
        }
    }
}

@MannBITS
Copy link

Hi @lior-k

I am also getting the same error: "{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 3,
"index" : "indexvectors",
"node" : "Q5VeFkIvQh6KLS6PQsUg2w",
"reason" : {
"type" : "illegal_state_exception",
"reason" : "binaryEmbeddingReader can't be null"
}
}
]
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
"

my data looks like:
{
"indexvectors" : {
"aliases" : { },
"mappings" : {
"vectordocs" : {
"properties" : {
"embedding-vector" : {
"type" : "binary",
"doc_values" : true
},
"id" : {
"type" : "text"
},
"vector" : {
"type" : "text"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1524853637835",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "76m277CESNiYnovi6n6Q8A",
"version" : {
"created" : "5060099"
},
"provided_name" : "indexvectors"
}
}
}
}

I have just added one record and used the same records vector field in query to get knn with k=1. Ideally the query should have returned the record present in the index but instead I got the above mentioned error. Could you help me out here?

@ltung-cit
Copy link
Author

Hi @lior-k

I ran the query you posted in 3 different ways and it returned the following results (note I have 2 document types: slide and searchResult and the property image_embedding is only declared for type slide):

  • <es-url>/<index>/_search -> 0 documents, which is weird because all documents of type searchResult don't have the field image_embedding.

  • <es-url>/<index>/slide/_search -> 0 documents, makes sense because all documents of type slide have the field image_embedding populated.

  • <es-url>/<index>/searchResult/_search -> 0 documents, which is weird because all documents of type searchResult don't have the field image_embedding.

@MannBITS
Copy link

MannBITS commented May 1, 2018

I was able to get the issue resolved by following lior-k's suggestion and making sure that 0 docs are returned for the query mentioned. I am able to get the KNN docs now using the plugin. Thanks @lior-k :-)

@ghost
Copy link

ghost commented May 7, 2018

I fixed my templates, and reindexed them, finally it works.
Before fixing, I used different field names between templates and documents, but it should be same.
And also, I defined the "embbeding_vector" field as "text", but it should be "binary".

@lior-k
Copy link
Owner

lior-k commented May 7, 2018

good to hear, closing the issue

@lior-k lior-k closed this as completed May 7, 2018
lior-k added a commit that referenced this issue May 7, 2018
added a comment regarding issue #6
@tgreiser
Copy link

tgreiser commented Jan 28, 2019

Also struggling with this problem. The plugin works in production, but when I use elasticdump to copy the data to a local server I start getting "binaryEmbeddingReader can't be null".

elasticdump --input=./account_mapping.json --output=http://localhost:9200/account --type=mapping
elasticdump --input=./account.json --output=http://localhost:9200/account --type=data

In this state my vector searches fail entirely. If I inspect the mapping my field is mapped correctly. If I use the painless query above I find 0 records. If I reindex my document then things start working on most of the shards.

POST http://localhost:9200/_reindex
{
  "source": {
    "index": "account"
  },
  "dest": {
    "index": "tmp"
  }
}

Then I do a second _reindex to rename from tmp back to account. My queries start working now, however - I still see exceptions firing in the ES server and my query _shards has 3 successful and 2 failed shards:

"_shards": {
        "total": 5,
        "successful": 3,
        "skipped": 0,
        "failed": 2,
        "failures": [
            {
                "shard": 0,
                "index": "account",
                "node": "HlfEVuX_TbO8u6GXu47REQ",
                "reason": {
                    "type": "illegal_state_exception",
                    "reason": "binaryEmbeddingReader can't be null"
                }
            }
        ]
    },

Update:
After about 15 minutes and a few reboots, the two buggy shards started working and I am getting 5/5 successful now. So if anyone else has the same problem - import, reindex and then wait a while while shards rebuild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants