DOC-765 | Cosine similarity fix for vector indexes (#719)

Simran-B · nerpaula · web-flow · commit 9cc6d8159bde · 2025-10-22T14:05:25.000+02:00
* Initial reference docs about the vector index

* HTTP API docs and refinements

* Version remark, OpenAPI minItems/maxItems and fix a type

* inBackground and parallelism are supported

* Review feedback, address cosine metric issue, reword inBackground, add parallelism

* Remove leftover line

* Add internal links to release notes

* Cosine similarity value out of range has been fixed

* Add innerProduct metric

---------

Co-authored-by: Paula Mihu &lt;97217318+nerpaula@users.noreply.github.com&gt;
diff --git a/site/content/3.12/aql/functions/vector.md b/site/content/3.12/aql/functions/vector.md
@@ -64,13 +64,14 @@ be found depends on the data as well as the search effort (see the `nProbe` opti
 
 `APPROX_NEAR_COSINE(vector1, vector2, options) → similarity`
 
-Retrieve the approximate angular similarity using the cosine metric, accelerated
-by a matching vector index.
 
-The higher the cosine similarity value is, the more similar the two vectors
-are. The closer it is to 0, the more different they are. The value can also
-be negative, indicating that the vectors are not similar and point in opposite
-directions. You need to sort in descending order so that the most similar
+Retrieve the approximate cosine of the angle between two vectors, accelerated
+by a matching vector index with the `cosine` metric.
+
+The closer the similarity value is to 1, the more similar the two vectors
+are. The closer it is to 0, the more different they are. The value can also be
+negative up to -1, indicating that the vectors are not similar and point in opposite
+directions. You need to **sort in descending order** so that the most similar
 documents come first, which is what a vector index using the `cosine` metric
 can provide.
 
@@ -83,8 +84,8 @@ can provide.
     closest Voronoi cells to consider for the search results. The larger the number,
     the slower the search but the better the search results. If not specified, the
     `defaultNProbe` value of the vector index is used.
-- returns **similarity** (number): The approximate angular similarity between
-  both vectors.
+- returns **similarity** (number): The approximate cosine similarity of
+  both normalized vectors. The value range is `[-1, 1]`.
 
 **Examples**
 
@@ -126,15 +127,83 @@ FOR docOuter IN coll
   RETURN { key: docOuter._key, neighbors }
 ```
 
+### APPROX_NEAR_INNER_PRODUCT()
+
+<small>Introduced in: v3.12.6</small>
+
+`APPROX_NEAR_INNER_PRODUCT(vector1, vector2, options) → similarity`
+
+Retrieve the approximate dot product of two vectors, accelerated by a matching
+vector index with the `innerProduct` metric.
+
+The higher the similarity value is, the more similar the two vectors
+are. The closer it is to 0, the more different they are. The value can also
+be negative, indicating that the vectors are not similar and point in opposite
+directions. You need to **sort in descending order** so that the most similar
+documents come first, which is what a vector index using the `innerProduct`
+metric can provide.
+
+- **vector1** (array of numbers): The first vector. Either this parameter or
+  `vector2` needs to reference a stored attribute holding the vector embedding.
+- **vector2** (array of numbers): The second vector. Either this parameter or
+  `vector1` needs to reference a stored attribute holding the vector embedding.
+- **options** (object, _optional_):
+  - **nProbe** (number, _optional_): How many neighboring centroids respectively
+    closest Voronoi cells to consider for the search results. The larger the number,
+    the slower the search but the better the search results. If not specified, the
+    `defaultNProbe` value of the vector index is used.
+- returns **similarity** (number): The approximate dot product
+  of both vectors without normalization. The value range is unbounded.
+
+**Examples**
+
+Return up to `10` similar documents based on their closeness to the vector
+`@q` according to the inner product metric:
+
+```aql
+FOR doc IN coll
+  SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
+  LIMIT 10
+  RETURN doc
+```
+
+Return up to `5` similar documents as well as the similarity value,
+considering `20` neighboring centroids respectively closest Voronoi cells:
+
+```aql
+FOR doc IN coll
+  LET similarity = APPROX_NEAR_INNER_PRODUCT(doc.vector, @q, { nProbe: 20 })
+  SORT similarity DESC
+  LIMIT 5
+  RETURN MERGE( { similarity }, doc)
+```
+
+Return the similarity value and the document keys of up to `3` similar documents
+for multiple input vectors using a subquery. In this example, the input vectors
+are taken from ten random documents of the same collection:
+
+```aql
+FOR docOuter IN coll
+  LIMIT 10
+  LET neighbors = (
+    FOR docInner IN coll
+      LET similarity = APPROX_NEAR_INNER_PRODUCT(docInner.vector, docOuter.vector)
+      SORT similarity DESC
+      LIMIT 3
+      RETURN { key: docInner._key, similarity }
+  )
+  RETURN { key: docOuter._key, neighbors }
+```
+
 ### APPROX_NEAR_L2()
 
-`APPROX_NEAR_L2(vector1, vector2, options) → similarity`
+`APPROX_NEAR_L2(vector1, vector2, options) → distance`
 
 Retrieve the approximate distance using the L2 (Euclidean) metric, accelerated
-by a matching vector index.
+by a matching vector index with the `l2` metric.
 
 The closer the distance is to 0, the more similar the two vectors are. The higher
-the value, the more different the they are. You need to sort in ascending order
+the value, the more different the they are. You need to **sort in ascending order**
 so that the most similar documents come first, which is what a vector index using
 the `l2` metric can provide.
 
@@ -147,7 +216,7 @@ the `l2` metric can provide.
     for the search results. The larger the number, the slower the search but the
     better the search results. If not specified, the `defaultNProbe` value of
     the vector index is used.
-- returns **similarity** (number): The approximate L2 (Euclidean) distance between
+- returns **distance** (number): The approximate L2 (Euclidean) distance between
   both vectors.
 
 **Examples**
diff --git a/site/content/3.12/develop/http-api/indexes/vector.md b/site/content/3.12/develop/http-api/indexes/vector.md
@@ -88,9 +88,14 @@ paths:
                   properties:
                     metric:
                       description: |
-                        Whether to use `cosine` or `l2` (Euclidean) distance calculation.
-                      type: string
-                      enum: ["cosine", "l2"]
+                        The measure for calculating the vector similarity:
+                        - `"cosine"`: Angular similarity. Vectors are automatically
+                          normalized before insertion and search.
+                        - `"innerProduct"` (introduced in v3.12.6):
+                          Similarity in terms of angle and magnitude.
+                          Vectors are not normalized, making it faster than `cosine`.
+                        - `"l2":` Euclidean distance.
+                      enum: ["cosine", "innerProduct", "l2"]
                     dimension:
                       description: |
                         The vector dimension. The attribute to index needs to
diff --git a/site/content/3.12/index-and-search/indexing/working-with-indexes/vector-indexes.md b/site/content/3.12/index-and-search/indexing/working-with-indexes/vector-indexes.md
@@ -14,10 +14,7 @@ data numerically and can be generated with machine learning models.
 You can then quickly find a given number of semantically similar documents by
 searching for close neighbors in a high-dimensional vector space.
 
-The vector index implementation uses the [Faiss library](https://github.com/facebookresearch/faiss/)
-to support L2 and cosine metrics. The index used is IndexIVFFlat, the quantizer
-for L2 is IndexFlatL2, and the cosine uses IndexFlatIP, where vectors are
-normalized before insertion and search.
+The vector index implementation uses the [Faiss library](https://github.com/facebookresearch/faiss/).
 
 ## How to use vector indexes
 
@@ -75,7 +72,13 @@ centroids and the quality of vector search thus degrades.
   write operations by not using an exclusive write lock for the duration
   of the index creation. The default is `false`.
 - **params**: The parameters as used by the Faiss library.
-  - **metric** (string): Whether to use `cosine` or `l2` (Euclidean) distance calculation.
+  - **metric** (string): The measure for calculating the vector similarity:
+    - `"cosine"`: Angular similarity. Vectors are automatically
+      normalized before insertion and search.
+    - `"innerProduct"` (introduced in v3.12.6):
+      Similarity in terms of angle and magnitude.
+      Vectors are not normalized, making it faster than `cosine`.
+    - `"l2":` Euclidean distance.
   - **dimension** (number): The vector dimension. The attribute to index needs to
     have this many elements in the array that stores the vector embedding.
   - **nLists** (number): The number of Voronoi cells to partition the vector space
@@ -115,7 +118,6 @@ centroids and the quality of vector search thus degrades.
 {{< tabs "interfaces" >}}
 
 {{< tab "Web interface" >}}
-{{< comment >}}TODO: Only in v3.12.6+
 1. In the **Collections** section, click the name or row of the desired collection.
 2. Go to the **Indexes** tab.
 3. Click **Add index**.
@@ -125,8 +127,6 @@ centroids and the quality of vector search thus degrades.
    under `param`.
 7. Optionally give the index a user-defined name.
 8. Click **Create**.
-{{< /comment >}}
-The web interface does not support vector indexes yet.
 {{< /tab >}}
 
 {{< tab "arangosh" >}}
diff --git a/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md
@@ -900,6 +900,17 @@ the following steps.
 4. Restore the dump to the new deployment. You can directly move from any
    3.11 or 3.12 version to 3.12.4 (or later) this way.
 
+## Cosine similarity fix for vector indexes
+
+<small>Introduced in: v3.12.6</small>
+
+A normalization issue has been addressed for the experimental vector index type.
+It was possible for the cosine similarity value returned by `APPROX_NEAR_COSINE()`
+to be outside the expected range of `[-1, 1]`.
+
+It is recommended to recreate all vector indexes that use the `cosine` metric
+after upgrading to v3.12.6 or later.
+
 ## HTTP RESTful API
 
 ### JavaScript-based traversal using `/_api/traversal` removed
diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
@@ -1443,6 +1443,18 @@ utilizing vector indexes in queries.
 Furthermore, a new error code `ERROR_QUERY_VECTOR_SEARCH_NOT_APPLIED` (1554)
 has been added.
 
+---
+
+<small>Introduced in: v3.12.6</small>
+
+Another metric has been added. The `innerProduct` is a vector similarity measure
+calculated using the dot product of two vectors without normalizing them.
+Therefore, it compares not only the angle but also the magnitudes.
+
+The accompanying AQL function is the following:
+
+- `APPROX_NEAR_INNER_PRODUCT()`
+
 ## Server options
 
 ### Effective and available startup options
diff --git a/site/content/3.13/aql/functions/vector.md b/site/content/3.13/aql/functions/vector.md
@@ -64,13 +64,14 @@ be found depends on the data as well as the search effort (see the `nProbe` opti
 
 `APPROX_NEAR_COSINE(vector1, vector2, options) → similarity`
 
-Retrieve the approximate angular similarity using the cosine metric, accelerated
-by a matching vector index.
 
-The higher the cosine similarity value is, the more similar the two vectors
-are. The closer it is to 0, the more different they are. The value can also
-be negative, indicating that the vectors are not similar and point in opposite
-directions. You need to sort in descending order so that the most similar
+Retrieve the approximate cosine of the angle between two vectors, accelerated
+by a matching vector index with the `cosine` metric.
+
+The closer the similarity value is to 1, the more similar the two vectors
+are. The closer it is to 0, the more different they are. The value can also be
+negative up to -1, indicating that the vectors are not similar and point in opposite
+directions. You need to **sort in descending order** so that the most similar
 documents come first, which is what a vector index using the `cosine` metric
 can provide.
 
@@ -83,8 +84,8 @@ can provide.
     closest Voronoi cells to consider for the search results. The larger the number,
     the slower the search but the better the search results. If not specified, the
     `defaultNProbe` value of the vector index is used.
-- returns **similarity** (number): The approximate angular similarity between
-  both vectors.
+- returns **similarity** (number): The approximate cosine similarity of
+  both normalized vectors. The value range is `[-1, 1]`.
 
 **Examples**
 
@@ -126,15 +127,83 @@ FOR docOuter IN coll
   RETURN { key: docOuter._key, neighbors }
 ```
 
+### APPROX_NEAR_INNER_PRODUCT()
+
+<small>Introduced in: v3.12.6</small>
+
+`APPROX_NEAR_INNER_PRODUCT(vector1, vector2, options) → similarity`
+
+Retrieve the approximate dot product of two vectors, accelerated by a matching
+vector index with the `innerProduct` metric.
+
+The higher the similarity value is, the more similar the two vectors
+are. The closer it is to 0, the more different they are. The value can also
+be negative, indicating that the vectors are not similar and point in opposite
+directions. You need to **sort in descending order** so that the most similar
+documents come first, which is what a vector index using the `innerProduct`
+metric can provide.
+
+- **vector1** (array of numbers): The first vector. Either this parameter or
+  `vector2` needs to reference a stored attribute holding the vector embedding.
+- **vector2** (array of numbers): The second vector. Either this parameter or
+  `vector1` needs to reference a stored attribute holding the vector embedding.
+- **options** (object, _optional_):
+  - **nProbe** (number, _optional_): How many neighboring centroids respectively
+    closest Voronoi cells to consider for the search results. The larger the number,
+    the slower the search but the better the search results. If not specified, the
+    `defaultNProbe` value of the vector index is used.
+- returns **similarity** (number): The approximate dot product
+  of both vectors without normalization. The value range is unbounded.
+
+**Examples**
+
+Return up to `10` similar documents based on their closeness to the vector
+`@q` according to the inner product metric:
+
+```aql
+FOR doc IN coll
+  SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
+  LIMIT 10
+  RETURN doc
+```
+
+Return up to `5` similar documents as well as the similarity value,
+considering `20` neighboring centroids respectively closest Voronoi cells:
+
+```aql
+FOR doc IN coll
+  LET similarity = APPROX_NEAR_INNER_PRODUCT(doc.vector, @q, { nProbe: 20 })
+  SORT similarity DESC
+  LIMIT 5
+  RETURN MERGE( { similarity }, doc)
+```
+
+Return the similarity value and the document keys of up to `3` similar documents
+for multiple input vectors using a subquery. In this example, the input vectors
+are taken from ten random documents of the same collection:
+
+```aql
+FOR docOuter IN coll
+  LIMIT 10
+  LET neighbors = (
+    FOR docInner IN coll
+      LET similarity = APPROX_NEAR_INNER_PRODUCT(docInner.vector, docOuter.vector)
+      SORT similarity DESC
+      LIMIT 3
+      RETURN { key: docInner._key, similarity }
+  )
+  RETURN { key: docOuter._key, neighbors }
+```
+
 ### APPROX_NEAR_L2()
 
-`APPROX_NEAR_L2(vector1, vector2, options) → similarity`
+`APPROX_NEAR_L2(vector1, vector2, options) → distance`
 
 Retrieve the approximate distance using the L2 (Euclidean) metric, accelerated
-by a matching vector index.
+by a matching vector index with the `l2` metric.
 
 The closer the distance is to 0, the more similar the two vectors are. The higher
-the value, the more different the they are. You need to sort in ascending order
+the value, the more different the they are. You need to **sort in ascending order**
 so that the most similar documents come first, which is what a vector index using
 the `l2` metric can provide.
 
@@ -147,7 +216,7 @@ the `l2` metric can provide.
     for the search results. The larger the number, the slower the search but the
     better the search results. If not specified, the `defaultNProbe` value of
     the vector index is used.
-- returns **similarity** (number): The approximate L2 (Euclidean) distance between
+- returns **distance** (number): The approximate L2 (Euclidean) distance between
   both vectors.
 
 **Examples**
diff --git a/site/content/3.13/develop/http-api/indexes/vector.md b/site/content/3.13/develop/http-api/indexes/vector.md
@@ -88,9 +88,14 @@ paths:
                   properties:
                     metric:
                       description: |
-                        Whether to use `cosine` or `l2` (Euclidean) distance calculation.
-                      type: string
-                      enum: ["cosine", "l2"]
+                        The measure for calculating the vector similarity:
+                        - `"cosine"`: Angular similarity. Vectors are automatically
+                          normalized before insertion and search.
+                        - `"innerProduct"` (introduced in v3.12.6):
+                          Similarity in terms of angle and magnitude.
+                          Vectors are not normalized, making it faster than `cosine`.
+                        - `"l2":` Euclidean distance.
+                      enum: ["cosine", "innerProduct", "l2"]
                     dimension:
                       description: |
                         The vector dimension. The attribute to index needs to
diff --git a/site/content/3.13/index-and-search/indexing/working-with-indexes/vector-indexes.md b/site/content/3.13/index-and-search/indexing/working-with-indexes/vector-indexes.md
@@ -14,10 +14,7 @@ data numerically and can be generated with machine learning models.
 You can then quickly find a given number of semantically similar documents by
 searching for close neighbors in a high-dimensional vector space.
 
-The vector index implementation uses the [Faiss library](https://github.com/facebookresearch/faiss/)
-to support L2 and cosine metrics. The index used is IndexIVFFlat, the quantizer
-for L2 is IndexFlatL2, and the cosine uses IndexFlatIP, where vectors are
-normalized before insertion and search.
+The vector index implementation uses the [Faiss library](https://github.com/facebookresearch/faiss/).
 
 ## How to use vector indexes
 
@@ -75,7 +72,13 @@ centroids and the quality of vector search thus degrades.
   write operations by not using an exclusive write lock for the duration
   of the index creation. The default is `false`.
 - **params**: The parameters as used by the Faiss library.
-  - **metric** (string): Whether to use `cosine` or `l2` (Euclidean) distance calculation.
+  - **metric** (string): The measure for calculating the vector similarity:
+    - `"cosine"`: Angular similarity. Vectors are automatically
+      normalized before insertion and search.
+    - `"innerProduct"` (introduced in v3.12.6):
+      Similarity in terms of angle and magnitude.
+      Vectors are not normalized, making it faster than `cosine`.
+    - `"l2":` Euclidean distance.
   - **dimension** (number): The vector dimension. The attribute to index needs to
     have this many elements in the array that stores the vector embedding.
   - **nLists** (number): The number of Voronoi cells to partition the vector space
diff --git a/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md
diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md