Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector search to experimental page #1051

Merged
merged 14 commits into from
Nov 24, 2024
58 changes: 54 additions & 4 deletions pages/database-management/experimental-features.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,75 @@ encouraged to thoroughly test the experimental features in your environment.

To enable multiple experimental features in Memgraph, you need to use the
`--experimental-enabled` flag followed by the name of the feature. Here's how
you can enable the `text-search`
feature at the same time:
you can enable the `vector-search` and `text-search` features at the same time:

`--experimental-enabled=text-search`
`--experimental-enabled=vector-search,text-search`

If you already have a Memgraph instance with an experimental feature enabled,
and you want to add another one, you will need to restart Memgraph. Ensure that
all the necessary flags are enabled during the restart.

Some experimental features, such as `vector-search`, require
`--experimental-config` flag for proper configuration.

</Callout>

## Experimental features list

### Vector Search
- **Description**: Enables approximate nearest neighbor searches in Memgraph by
leveraging the [HNSW](https://arxiv.org/abs/1603.09320) algorithm to index
node properties as vectors. This feature supports various similarity metrics,
enabling efficient retrieval of nodes based on vector similarity. While the
database itself can operate at any configured isolation level, the vector
index specifically functions at the `READ UNCOMMITTED` isolation level.
**This means that changes to the index made by uncommitted transactions are
immediately visible to other transactions, regardless of the database's
isolation level!**
- **Status**: Experimental. The vector search feature is still under
development and not fully integrated with other Memgraph functionalities.
Unlike other index types, it currently lacks support for durability,
replication, and multi-tenancy. Additionally, vector indexes must be created
using the `--experimental-config` flag, rather than through Cypher queries
like other indexes. Although durability, replication, and multi-tenancy are
not yet supported, you can work around these limitations by ensuring that
every database instance (main or replica) is started with the same
`--experimental-config` flag. This ensures expected behavior for these
features across instances. Vector search is an actively developed feature,
and we are continuously working on improving its capabilities. At the time of
writing, 2024-11-27, the plan is to make feature stable at the time of the
next release which is 2025-01-29.
- **Usage**: Enable the feature using the flag
`--experimental-enabled=vector-search`. Configure the feature using
`--experimental-config`. For example:
```shell
--experimental-config='
{
"vector-search": {
"index_name": {
"label": "Node",
"property": "vector",
"dimension": 2,
"capacity": 1000,
"metric": "cos",
"scalar": "f32",
"resize_coefficient": 2
}
}
}
'
```
Keep in mind that both flags are required and the configuration requires the
following mandatory fields: `label`, `property`, `dimension` and `capacity`.
- **Documentation**: For more details, visit the [vector search
page](/querying/vector-search).

### Text Search
- **Description**: Enables finding nodes based on property values within
Memgraph, supporting both literal and regex searches, and the concept of
aggregation. It is not guaranteed to work correctly in use cases involving
concurrent transactions and replication.
- **Status**: Experimental. Provides text search functionalities but is not
- **Status**: Early experimental. Provides text search functionalities but is not
guaranteed to work correctly in use cases involving concurrent transactions
and replication. There are no short-term plans to improve text-search
capabilities. If you are interested in improvements, please contact us on
Expand Down
1 change: 1 addition & 0 deletions pages/querying/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"expressions": "Expressions",
"schema": "Schema",
"text-search": "Text search",
"vector-search": "Vector search",
"time-to-live": "Time to live",
"query-plan": "Query plan",
"exploring-datasets": "Exploring datasets"
Expand Down
78 changes: 78 additions & 0 deletions pages/querying/vector-search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Vector search
description: Learn how to use vector search and manage vector indices in Memgraph.
---

import { Callout } from 'nextra/components'

# Vector search

The vector search is currently an experimental feature. For the exact status
please visit the [experimental
page](/database-management/experimental-features). To enable the feature use
the following `--experimental-enabled=vector-search`. To configure the feature
use `--experimental-config`. For example:
```shell
--experimental-config=
'
{
"vector-search": {
"index_name": {
"label": "Node",
"property": "vector",
"dimension": 2,
"capacity": 1000,
"metric": "cos",
"scalar": "f32",
"resize_coefficient": 2,
}
}
}
'
```
Keep in mind that both flags are required and the configuration requires the
following mandatory fields: `label`, `property`, `dimension` and `capacity`.

### Metric Kinds

The following table lists the supported metric kinds for vector search. These
metrics determine how distances between vectors are calculated. Default type
for the metric is `l2sq`.

| Metric | Description |
|-------------|------------------------------------------------------|
| `ip` | Inner product (dot product). |
| `cos` | Cosine similarity. |
| `l2sq` | Squared Euclidean distance. |
| `pearson` | Pearson correlation coefficient. |
| `haversine` | Haversine distance (suitable for geographic data). |
| `divergence`| A divergence-based metric. |
| `hamming` | Hamming distance. |
| `tanimoto` | Tanimoto coefficient. |
| `sorensen` | Sørensen-Dice coefficient. |
| `jaccard` | Jaccard index. |

### Scalar Kinds

Scalar kinds define the data type of each vector element. Default type for the
metric is `f32`.

| Scalar | Description |
|-----------|--------------------------------------------------------|
| `b1x8` | Binary format (1 bit per element, stored in 8-bit chunks). |
| `u40` | Unsigned 40-bit integer. |
| `uuid` | Universally unique identifier (UUID). |
| `bf16` | 16-bit floating point (bfloat16). |
| `f64` | 64-bit floating point (double). |
| `f32` | 32-bit floating point (float). |
| `f16` | 16-bit floating point. |
| `f8` | 8-bit floating point. |
| `u64` | 64-bit unsigned integer. |
| `u32` | 32-bit unsigned integer. |
| `u16` | 16-bit unsigned integer. |
| `u8` | 8-bit unsigned integer. |
| `i64` | 64-bit signed integer. |
| `i32` | 32-bit signed integer. |
| `i16` | 16-bit signed integer. |
| `i8` | 8-bit signed integer. |