Skip to content

Commit 16747f8

Browse files
Add l1norm and l2norm distances for vectors (#44116)
* Add l1norm and l2norm distances for vectors Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 * Address Christoph's feedback - organize vector functions as a separate doc - increase precision in tests calculations - add a separate test when sparse doc dims are bigger and less than query vector dims * Made examples more realistic
1 parent 7149c2b commit 16747f8

File tree

8 files changed

+895
-172
lines changed

8 files changed

+895
-172
lines changed

docs/reference/query-dsl/script-score-query.asciidoc

Lines changed: 4 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ a function to be used to compute a new score for each document returned
1111
by the query. For more information on scripting see
1212
<<modules-scripting, scripting documentation>>.
1313

14-
1514
Here is an example of using `script_score` to assign each matched document
1615
a score equal to the number of likes divided by 10:
1716

@@ -32,7 +31,6 @@ GET /_search
3231
}
3332
--------------------------------------------------
3433
// CONSOLE
35-
// TEST[setup:twitter]
3634

3735
NOTE: The values returned from `script_score` cannot be negative. In general,
3836
Lucene requires the scores produced by queries to be non-negative in order to
@@ -76,140 +74,6 @@ to be the most efficient by using the internal mechanisms.
7674
--------------------------------------------------
7775
// NOTCONSOLE
7876

79-
[role="xpack"]
80-
[testenv="basic"]
81-
[[vector-functions]]
82-
===== Functions for vector fields
83-
84-
experimental[]
85-
86-
These functions are used for
87-
for <<dense-vector,`dense_vector`>> and
88-
<<sparse-vector,`sparse_vector`>> fields.
89-
90-
NOTE: During vector functions' calculation, all matched documents are
91-
linearly scanned. Thus, expect the query time grow linearly
92-
with the number of matched documents. For this reason, we recommend
93-
to limit the number of matched documents with a `query` parameter.
94-
95-
For dense_vector fields, `cosineSimilarity` calculates the measure of
96-
cosine similarity between a given query vector and document vectors.
97-
98-
[source,js]
99-
--------------------------------------------------
100-
{
101-
"query": {
102-
"script_score": {
103-
"query": {
104-
"match_all": {}
105-
},
106-
"script": {
107-
"source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <1>
108-
"params": {
109-
"query_vector": [4, 3.4, -0.2] <2>
110-
}
111-
}
112-
}
113-
}
114-
}
115-
--------------------------------------------------
116-
// NOTCONSOLE
117-
<1> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
118-
<2> To take advantage of the script optimizations, provide a query vector as a script parameter.
119-
120-
Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
121-
between a given query vector and document vectors.
122-
123-
[source,js]
124-
--------------------------------------------------
125-
{
126-
"query": {
127-
"script_score": {
128-
"query": {
129-
"match_all": {}
130-
},
131-
"script": {
132-
"source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
133-
"params": {
134-
"query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
135-
}
136-
}
137-
}
138-
}
139-
}
140-
--------------------------------------------------
141-
// NOTCONSOLE
142-
143-
For dense_vector fields, `dotProduct` calculates the measure of
144-
dot product between a given query vector and document vectors.
145-
146-
[source,js]
147-
--------------------------------------------------
148-
{
149-
"query": {
150-
"script_score": {
151-
"query": {
152-
"match_all": {}
153-
},
154-
"script": {
155-
"source": """
156-
double value = dotProduct(params.query_vector, doc['my_vector']);
157-
return sigmoid(1, Math.E, -value); <1>
158-
""",
159-
"params": {
160-
"query_vector": [4, 3.4, -0.2]
161-
}
162-
}
163-
}
164-
}
165-
}
166-
--------------------------------------------------
167-
// NOTCONSOLE
168-
169-
<1> Using the standard sigmoid function prevents scores from being negative.
170-
171-
Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
172-
between a given query vector and document vectors.
173-
174-
[source,js]
175-
--------------------------------------------------
176-
{
177-
"query": {
178-
"script_score": {
179-
"query": {
180-
"match_all": {}
181-
},
182-
"script": {
183-
"source": """
184-
double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
185-
return sigmoid(1, Math.E, -value);
186-
""",
187-
"params": {
188-
"query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
189-
}
190-
}
191-
}
192-
}
193-
}
194-
--------------------------------------------------
195-
// NOTCONSOLE
196-
197-
NOTE: If a document doesn't have a value for a vector field on which
198-
a vector function is executed, an error will be thrown.
199-
200-
You can check if a document has a value for the field `my_vector` by
201-
`doc['my_vector'].size() == 0`. Your overall script can look like this:
202-
203-
[source,js]
204-
--------------------------------------------------
205-
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
206-
--------------------------------------------------
207-
// NOTCONSOLE
208-
209-
NOTE: If a document's dense vector field has a number of dimensions
210-
different from the query's vector, an error will be thrown.
211-
212-
21377
[[random-score-function]]
21478
===== Random score function
21579
`random_score` function generates scores that are uniformly distributed
@@ -323,6 +187,9 @@ You can read more about decay functions
323187
NOTE: Decay functions on dates are limited to dates in the default format
324188
and default time zone. Also calculations with `now` are not supported.
325189

190+
===== Functions for vector fields
191+
<<vector-functions, Functions for vector fields>> are accessible through
192+
`script_score` query.
326193

327194
==== Faster alternatives
328195
Script Score Query calculates the score for every hit (matching document).
@@ -422,5 +289,4 @@ through a script:
422289
Script Score query has equivalent <<decay-functions, decay functions>>
423290
that can be used in script.
424291

425-
426-
292+
include::{es-repo-dir}/vectors/vector-functions.asciidoc[]

0 commit comments

Comments
 (0)