Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid empty doc values on vector functions #43944

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions docs/reference/query-dsl/script-score-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,16 @@ between a given query vector and document vectors.
// NOTCONSOLE

NOTE: If a document doesn't have a value for a vector field on which
a vector function is executed, 0 is returned as a result
for this document.
a vector function is executed, an error will be thrown.

You can check if a document has a value for the field `my_vector` by
`doc['my_vector'].size() == 0`. Your overall script can look like this:

[source,js]
--------------------------------------------------
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
--------------------------------------------------
// NOTCONSOLE

NOTE: If a document's dense vector field has a number of dimensions
different from the query's vector, an error will be thrown.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ setup:
- match: { error.root_cause.0.type: "script_exception" }

---
"Distance functions for documents missing vector field should return 0":
"Documents missing a vector field":
- do:
index:
index: test-index
Expand All @@ -149,7 +149,9 @@ setup:
- do:
indices.refresh: {}

# expect an error when documents miss a vector field
- do:
catch: bad_request
headers:
Content-Type: application/json
search:
Expand All @@ -162,6 +164,22 @@ setup:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
params:
query_vector: [10.0, 10.0, 10.0]
- match: { error.root_cause.0.type: "script_exception" }

# guard against missing values by checking size()
- do:
headers:
Content-Type: application/json
search:
rest_total_hits_as_int: true
body:
query:
script_score:
query: {match_all: {} }
script:
source: "doc['my_dense_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
params:
query_vector: [10.0, 10.0, 10.0]

- match: {hits.total: 2}
- match: {hits.hits.0._id: "1"}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ setup:
- match: {hits.hits.2._id: "3"}

---
"Distance functions for documents missing vector field should return 0":
"Documents missing a vector field":
- do:
index:
index: test-index
Expand All @@ -105,7 +105,9 @@ setup:
- do:
indices.refresh: {}

# expect an error when documents miss a vector field
- do:
catch: bad_request
headers:
Content-Type: application/json
search:
Expand All @@ -118,6 +120,22 @@ setup:
source: "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector'])"
params:
query_vector: {"1": 10.0}
- match: { error.root_cause.0.type: "script_exception" }

# guard against missing values by checking size()
- do:
headers:
Content-Type: application/json
search:
rest_total_hits_as_int: true
body:
query:
script_score:
query: {match_all: {} }
script:
source: "doc['my_sparse_vector'].size() == 0 ? 0 : cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector'])"
params:
query_vector: {"1": 10.0}

- match: {hits.total: 2}
- match: {hits.hits.0._id: "1"}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ public class ScoreScriptUtils {
*/
public static double dotProduct(List<Number> queryVector, VectorScriptDocValues.DenseVectorScriptDocValues dvs){
BytesRef value = dvs.getEncodedValue();
if (value == null) return 0;
float[] docVector = VectorEncoderDecoder.decodeDenseVector(value);
if (queryVector.size() != docVector.length) {
throw new IllegalArgumentException("Can't calculate dotProduct! The number of dimensions of the query vector [" +
Expand Down Expand Up @@ -63,7 +62,6 @@ public CosineSimilarity(List<Number> queryVector) {

public double cosineSimilarity(VectorScriptDocValues.DenseVectorScriptDocValues dvs) {
BytesRef value = dvs.getEncodedValue();
if (value == null) return 0;
float[] docVector = VectorEncoderDecoder.decodeDenseVector(value);
if (queryVector.size() != docVector.length) {
throw new IllegalArgumentException("Can't calculate cosineSimilarity! The number of dimensions of the query vector [" +
Expand Down Expand Up @@ -129,7 +127,6 @@ public DotProductSparse(Map<String, Number> queryVector) {

public double dotProductSparse(VectorScriptDocValues.SparseVectorScriptDocValues dvs) {
BytesRef value = dvs.getEncodedValue();
if (value == null) return 0;
int[] docDims = VectorEncoderDecoder.decodeSparseVectorDims(value);
float[] docValues = VectorEncoderDecoder.decodeSparseVector(value);
return intDotProductSparse(queryValues, queryDims, docValues, docDims);
Expand Down Expand Up @@ -174,7 +171,6 @@ public CosineSimilaritySparse(Map<String, Number> queryVector) {

public double cosineSimilaritySparse(VectorScriptDocValues.SparseVectorScriptDocValues dvs) {
BytesRef value = dvs.getEncodedValue();
if (value == null) return 0;
int[] docDims = VectorEncoderDecoder.decodeSparseVectorDims(value);
float[] docValues = VectorEncoderDecoder.decodeSparseVector(value);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,11 @@ public BytesRef get(int index) {

@Override
public int size() {
throw new UnsupportedOperationException("vector fields may only be used via vector functions in scripts");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment: now that you can call size on a vector field, perhaps we could update the error message above to say "accessing a vector field's value through 'get' or 'value' is not supported".

if (value == null) {
return 0;
} else {
return 1;
}
}

// not final, as it needs to be extended by Mockito for tests
Expand Down