Skip to content

Commit 49fbafa

Browse files
committed
Merge branch 'main' of https://github.com/arangodb/docs-hugo into DOC-799
2 parents d6f3359 + cdd39d5 commit 49fbafa

File tree

45 files changed

+3868
-341
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3868
-341
lines changed

site/content/3.12/about-arangodb/features/core.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,11 +150,9 @@ available from v3.12.5 onward.
150150
threshold is reached.
151151
{{% /comment %}}
152152

153-
{{% comment %}} Experimental feature
154153
- [**Vector search**](../../index-and-search/indexing/working-with-indexes/vector-indexes.md):
155154
Find items with similar properties by comparing vector embeddings generated by
156155
machine learning models.
157-
{{% /comment %}}
158156

159157
- [**Search highlighting**](../../index-and-search/arangosearch/search-highlighting.md):
160158
Get the substring positions of matched terms, phrases, or _n_-grams.

site/content/3.12/aql/functions/geo.md

Lines changed: 116 additions & 97 deletions
Large diffs are not rendered by default.

site/content/3.12/aql/functions/vector.md

Lines changed: 101 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ You can calculate vector embeddings using [ArangoDB's GraphML](../../data-scienc
1616
capabilities (available in ArangoGraph) or using external tools.
1717

1818
{{< warning >}}
19-
The vector index is an experimental feature that you need to enable for the
20-
ArangoDB server with the `--experimental-vector-index` startup option.
19+
You need to enable the vector index feature for the
20+
ArangoDB server with the `--vector-index` startup option.
2121
Once enabled for a deployment, it cannot be disabled anymore because it
2222
permanently changes how the data is managed by the RocksDB storage engine
2323
(it adds an additional column family).
2424

25-
To restore a dump that contains vector indexes, the `--experimental-vector-index`
25+
To restore a dump that contains vector indexes, the `--vector-index`
2626
startup option needs to be enabled on the deployment you want to restore to.
2727
{{< /warning >}}
2828

@@ -56,21 +56,37 @@ be found depends on the data as well as the search effort (see the `nProbe` opti
5656
{{< info >}}
5757
- If there is more than one suitable vector index over the same attribute, it is
5858
undefined which one is selected.
59-
- You cannot have any `FILTER` operation between `FOR` and `LIMIT` for
60-
pre-filtering.
59+
60+
- In v3.12.4 and v3.12.5, you cannot have any `FILTER` operation between `FOR`
61+
and `LIMIT` for pre-filtering. From v3.12.6 onward, you can add `FILTER`
62+
operations between `FOR` and `SORT` that are then applied during the lookup in
63+
the vector index. Example:
64+
65+
```aql
66+
FOR doc IN coll
67+
FILTER doc.val > 3
68+
SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
69+
LIMIT 5
70+
RETURN doc
71+
```
72+
73+
Note that e.g. `LIMIT 5` does not ensure that you get 5 results by searching
74+
as many neighboring Voronoi cells as necessary, but it rather considers only as
75+
many as configured via the `nProbes` parameter.
6176
{{< /info >}}
6277

6378
### APPROX_NEAR_COSINE()
6479

6580
`APPROX_NEAR_COSINE(vector1, vector2, options) → similarity`
6681

67-
Retrieve the approximate angular similarity using the cosine metric, accelerated
68-
by a matching vector index.
6982

70-
The higher the cosine similarity value is, the more similar the two vectors
71-
are. The closer it is to 0, the more different they are. The value can also
72-
be negative, indicating that the vectors are not similar and point in opposite
73-
directions. You need to sort in descending order so that the most similar
83+
Retrieve the approximate cosine of the angle between two vectors, accelerated
84+
by a matching vector index with the `cosine` metric.
85+
86+
The closer the similarity value is to 1, the more similar the two vectors
87+
are. The closer it is to 0, the more different they are. The value can also be
88+
negative up to -1, indicating that the vectors are not similar and point in opposite
89+
directions. You need to **sort in descending order** so that the most similar
7490
documents come first, which is what a vector index using the `cosine` metric
7591
can provide.
7692

@@ -83,8 +99,8 @@ can provide.
8399
closest Voronoi cells to consider for the search results. The larger the number,
84100
the slower the search but the better the search results. If not specified, the
85101
`defaultNProbe` value of the vector index is used.
86-
- returns **similarity** (number): The approximate angular similarity between
87-
both vectors.
102+
- returns **similarity** (number): The approximate cosine similarity of
103+
both normalized vectors. The value range is `[-1, 1]`.
88104

89105
**Examples**
90106

@@ -126,15 +142,83 @@ FOR docOuter IN coll
126142
RETURN { key: docOuter._key, neighbors }
127143
```
128144

145+
### APPROX_NEAR_INNER_PRODUCT()
146+
147+
<small>Introduced in: v3.12.6</small>
148+
149+
`APPROX_NEAR_INNER_PRODUCT(vector1, vector2, options) → similarity`
150+
151+
Retrieve the approximate dot product of two vectors, accelerated by a matching
152+
vector index with the `innerProduct` metric.
153+
154+
The higher the similarity value is, the more similar the two vectors
155+
are. The closer it is to 0, the more different they are. The value can also
156+
be negative, indicating that the vectors are not similar and point in opposite
157+
directions. You need to **sort in descending order** so that the most similar
158+
documents come first, which is what a vector index using the `innerProduct`
159+
metric can provide.
160+
161+
- **vector1** (array of numbers): The first vector. Either this parameter or
162+
`vector2` needs to reference a stored attribute holding the vector embedding.
163+
- **vector2** (array of numbers): The second vector. Either this parameter or
164+
`vector1` needs to reference a stored attribute holding the vector embedding.
165+
- **options** (object, _optional_):
166+
- **nProbe** (number, _optional_): How many neighboring centroids respectively
167+
closest Voronoi cells to consider for the search results. The larger the number,
168+
the slower the search but the better the search results. If not specified, the
169+
`defaultNProbe` value of the vector index is used.
170+
- returns **similarity** (number): The approximate dot product
171+
of both vectors without normalization. The value range is unbounded.
172+
173+
**Examples**
174+
175+
Return up to `10` similar documents based on their closeness to the vector
176+
`@q` according to the inner product metric:
177+
178+
```aql
179+
FOR doc IN coll
180+
SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
181+
LIMIT 10
182+
RETURN doc
183+
```
184+
185+
Return up to `5` similar documents as well as the similarity value,
186+
considering `20` neighboring centroids respectively closest Voronoi cells:
187+
188+
```aql
189+
FOR doc IN coll
190+
LET similarity = APPROX_NEAR_INNER_PRODUCT(doc.vector, @q, { nProbe: 20 })
191+
SORT similarity DESC
192+
LIMIT 5
193+
RETURN MERGE( { similarity }, doc)
194+
```
195+
196+
Return the similarity value and the document keys of up to `3` similar documents
197+
for multiple input vectors using a subquery. In this example, the input vectors
198+
are taken from ten random documents of the same collection:
199+
200+
```aql
201+
FOR docOuter IN coll
202+
LIMIT 10
203+
LET neighbors = (
204+
FOR docInner IN coll
205+
LET similarity = APPROX_NEAR_INNER_PRODUCT(docInner.vector, docOuter.vector)
206+
SORT similarity DESC
207+
LIMIT 3
208+
RETURN { key: docInner._key, similarity }
209+
)
210+
RETURN { key: docOuter._key, neighbors }
211+
```
212+
129213
### APPROX_NEAR_L2()
130214

131-
`APPROX_NEAR_L2(vector1, vector2, options) → similarity`
215+
`APPROX_NEAR_L2(vector1, vector2, options) → distance`
132216

133217
Retrieve the approximate distance using the L2 (Euclidean) metric, accelerated
134-
by a matching vector index.
218+
by a matching vector index with the `l2` metric.
135219

136220
The closer the distance is to 0, the more similar the two vectors are. The higher
137-
the value, the more different the they are. You need to sort in ascending order
221+
the value, the more different the they are. You need to **sort in ascending order**
138222
so that the most similar documents come first, which is what a vector index using
139223
the `l2` metric can provide.
140224

@@ -147,7 +231,7 @@ the `l2` metric can provide.
147231
for the search results. The larger the number, the slower the search but the
148232
better the search results. If not specified, the `defaultNProbe` value of
149233
the vector index is used.
150-
- returns **similarity** (number): The approximate L2 (Euclidean) distance between
234+
- returns **distance** (number): The approximate L2 (Euclidean) distance between
151235
both vectors.
152236

153237
**Examples**

site/content/3.12/aql/graphs/all-shortest-paths.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,56 @@ All collections in the list that do not specify their own direction use the
117117
direction defined after `IN` (here: `OUTBOUND`). This allows using a different
118118
direction for each collection in your path search.
119119

120+
### Graph path searches in a cluster
121+
122+
Due to the nature of graphs, edges may reference nodes from arbitrary
123+
collections. Following the paths can thus involve documents from various
124+
collections and it is not possible to predict which are visited in a
125+
traversal. Which collections need to be loaded by the graph engine can only be
126+
determined at run time.
127+
128+
Use the [`WITH` operation](../high-level-operations/with.md) to specify the
129+
node collections you expect to be involved. This is required for traversals
130+
using collection sets in cluster deployments. Declare the collection of the
131+
start node as well if it's not declared already (like by a `FOR` loop).
132+
133+
{{< tip >}}
134+
From v3.12.6 onward, node collections are automatically deduced for graph
135+
queries using collection sets / anonymous graphs if there is a named graph with
136+
a matching edge collection in its edge definitions.
137+
138+
For example, suppose you have two node collections, `person` and `movie`, and
139+
an `acts_in` edge collection that connects them. If you want to run a path search
140+
query that starts (and ends) at a person that you specify with its document ID,
141+
you need to declare both node collections at the beginning of the query:
142+
143+
```aql
144+
WITH person, movie
145+
FOR p IN ANY ALL_SHORTEST_PATHS "person/1544" TO "person/52560" acts_in
146+
RETURN p.vertices[*].label
147+
```
148+
149+
However, if there is a named graph that includes an edge definition for the
150+
`acts_in` edge collection, with `person` as the _from_ collection and `movie`
151+
as the _to_ collection, you can omit `WITH person, movie`. That is, if you
152+
specify `acts_in` as an edge collection in an anonymous graph query, all
153+
named graphs are checked for this edge collection, and if there is a matching
154+
edge definition, its node collections are automatically added as data sources to
155+
the query.
156+
157+
```aql
158+
FOR p IN ANY ALL_SHORTEST_PATHS "person/1544" TO "person/52560" acts_in
159+
RETURN p.vertices[*].label
160+
161+
// Chris Rock --> Dogma <-- Ben Affleck --> Surviving Christmas <-- Jennifer Morrison
162+
// Chris Rock --> The Longest Yard <-- Rob Schneider --> Big Stan <-- Jennifer Morrison
163+
// Chris Rock --> Down to Earth <-- John Cho --> Star Trek <-- Jennifer Morrison
164+
```
165+
166+
You can still declare collections manually, in which case they are added as
167+
data sources in addition to automatically deduced collections.
168+
{{< /tip >}}
169+
120170
## Examples
121171

122172
Load an example graph to get a named graph that reflects some possible

site/content/3.12/aql/graphs/k-paths.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,57 @@ All collections in the list that do not specify their own direction use the
188188
direction defined after `IN` (here: `OUTBOUND`). This allows to use a different
189189
direction for each collection in your path search.
190190

191+
### Graph path searches in a cluster
192+
193+
Due to the nature of graphs, edges may reference nodes from arbitrary
194+
collections. Following the paths can thus involve documents from various
195+
collections and it is not possible to predict which are visited in a
196+
traversal. Which collections need to be loaded by the graph engine can only be
197+
determined at run time.
198+
199+
Use the [`WITH` operation](../high-level-operations/with.md) to specify the
200+
node collections you expect to be involved. This is required for traversals
201+
using collection sets in cluster deployments. Declare the collection of the
202+
start node as well if it's not declared already (like by a `FOR` loop).
203+
204+
{{< tip >}}
205+
From v3.12.6 onward, node collections are automatically deduced for graph
206+
queries using collection sets / anonymous graphs if there is a named graph with
207+
a matching edge collection in its edge definitions.
208+
209+
For example, suppose you have two node collections, `person` and `movie`, and
210+
an `acts_in` edge collection that connects them. If you want to run a path search
211+
query that starts (and ends) at a person that you specify with its document ID,
212+
you need to declare both node collections at the beginning of the query:
213+
214+
```aql
215+
WITH person, movie
216+
FOR p IN 4 ANY K_PATHS "person/1544" TO "person/52560" acts_in
217+
LIMIT 2
218+
RETURN p.vertices[*].label
219+
```
220+
221+
However, if there is a named graph that includes an edge definition for the
222+
`acts_in` edge collection, with `person` as the _from_ collection and `movie`
223+
as the _to_ collection, you can omit `WITH person, movie`. That is, if you
224+
specify `acts_in` as an edge collection in an anonymous graph query, all
225+
named graphs are checked for this edge collection, and if there is a matching
226+
edge definition, its node collections are automatically added as data sources to
227+
the query.
228+
229+
```aql
230+
FOR p IN 4 ANY K_PATHS "person/1544" TO "person/52560" acts_in
231+
LIMIT 2
232+
RETURN p.vertices[*].label
233+
234+
// Chris Rock --> Dogma <-- Ben Affleck --> Surviving Christmas <-- Jennifer Morrison
235+
// Chris Rock --> The Longest Yard <-- Rob Schneider --> Big Stan <-- Jennifer Morrison
236+
```
237+
238+
You can still declare collections manually, in which case they are added as
239+
data sources in addition to automatically deduced collections.
240+
{{< /tip >}}
241+
191242
## Examples
192243

193244
You can load the `kShortestPathsGraph` example graph to get a named graph that

site/content/3.12/aql/graphs/k-shortest-paths.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,57 @@ All collections in the list that do not specify their own direction use the
186186
direction defined after `IN` (here: `OUTBOUND`). This allows to use a different
187187
direction for each collection in your path search.
188188

189+
### Graph path searches in a cluster
190+
191+
Due to the nature of graphs, edges may reference nodes from arbitrary
192+
collections. Following the paths can thus involve documents from various
193+
collections and it is not possible to predict which are visited in a
194+
traversal. Which collections need to be loaded by the graph engine can only be
195+
determined at run time.
196+
197+
Use the [`WITH` operation](../high-level-operations/with.md) to specify the
198+
node collections you expect to be involved. This is required for traversals
199+
using collection sets in cluster deployments. Declare the collection of the
200+
start node as well if it's not declared already (like by a `FOR` loop).
201+
202+
{{< tip >}}
203+
From v3.12.6 onward, node collections are automatically deduced for graph
204+
queries using collection sets / anonymous graphs if there is a named graph with
205+
a matching edge collection in its edge definitions.
206+
207+
For example, suppose you have two node collections, `person` and `movie`, and
208+
an `acts_in` edge collection that connects them. If you want to run a path search
209+
query that starts (and ends) at a person that you specify with its document ID,
210+
you need to declare both node collections at the beginning of the query:
211+
212+
```aql
213+
WITH person, movie
214+
FOR p IN ANY K_SHORTEST_PATHS "person/1544" TO "person/52560" acts_in
215+
LIMIT 2
216+
RETURN p.vertices[*].label
217+
```
218+
219+
However, if there is a named graph that includes an edge definition for the
220+
`acts_in` edge collection, with `person` as the _from_ collection and `movie`
221+
as the _to_ collection, you can omit `WITH person, movie`. That is, if you
222+
specify `acts_in` as an edge collection in an anonymous graph query, all
223+
named graphs are checked for this edge collection, and if there is a matching
224+
edge definition, its node collections are automatically added as data sources to
225+
the query.
226+
227+
```aql
228+
FOR p IN ANY K_SHORTEST_PATHS "person/1544" TO "person/52560" acts_in
229+
LIMIT 2
230+
RETURN p.vertices[*].label
231+
232+
// Chris Rock --> Dogma <-- Ben Affleck --> Surviving Christmas <-- Jennifer Morrison
233+
// Chris Rock --> The Longest Yard <-- Rob Schneider --> Big Stan <-- Jennifer Morrison
234+
```
235+
236+
You can still declare collections manually, in which case they are added as
237+
data sources in addition to automatically deduced collections.
238+
{{< /tip >}}
239+
189240
## Examples
190241

191242
You can load the `kShortestPathsGraph` example graph to get a named graph that

0 commit comments

Comments
 (0)