Skip to content

Commit

Permalink
Add vector search to the experimental page (#1051)
Browse files Browse the repository at this point in the history
  • Loading branch information
DavIvek authored Nov 24, 2024
1 parent 9cf0560 commit 4404193
Show file tree
Hide file tree
Showing 5 changed files with 236 additions and 18 deletions.
4 changes: 4 additions & 0 deletions pages/database-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ needs.

Check how to enable Memgraph Enterprise to get access to advanced features.

## [Experimental features](/database-management/experimental-features)

Experimental features in Memgraph represent the forefront of our innovation, offering cutting-edge functionality as we continuously enhance our product. Learn how to try them out.

## [Logs](/database-management/logs)

Check how to access logs and change log tracking level. Learn about query audit logging in Memgraph Enterprise.
Expand Down
8 changes: 5 additions & 3 deletions pages/database-management/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -397,9 +397,11 @@ workers simultaneously.

This section contains the list of flags that are used to configure [experimental features](/database-management/experimental-features) in Memgraph.

| Flag | Description |
|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
| --experimental-enabled=text-search | Enables text search capabilities within Memgraph. |
| Flag | Description |
|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| --experimental-enabled=text-search | Enables text search capabilities within Memgraph. |
| --experimental-enabled=vector-search | Enables vector search capabilities within Memgraph. |
| --experimental-config=config | Configures experimental features. Currently, `experimental-config` is used only for [vector search](/database-management/experimental-features#vector-search). |

### High availability

Expand Down
102 changes: 87 additions & 15 deletions pages/database-management/experimental-features.mdx
Original file line number Diff line number Diff line change
@@ -1,57 +1,129 @@
---
title: Experimental features
description: Experimental features are new functionalities in Memgraph that carry some risk. They are still in development and may have stability issues. In case of any challenges, get in touch with your support representative for quick help.
description: Experimental features in Memgraph represent the forefront of our innovation, offering cutting-edge functionality as we continuously enhance our product. While these features are still evolving and may experience occasional instability, our dedicated support team is ready to assist you promptly with any challenges, ensuring a seamless experience.
---

import { Callout } from 'nextra/components'
import { Card, Cards } from 'nextra/components'

# Experimental features

Memgraph has several features in the experimental phase:

- [Vector search](#vector-search)
- [Text search](#text-search)

These features are actively being developed and may change or be removed in
future releases. Use them with caution in production environments and only if
you thoroughly understand their implications and are prepared for potential API
changes, data loss, service interruption, or data inconsistency. It is highly
encouraged to thoroughly test the experimental features in your environment.

<Callout>
To use experimental features in Memgraph, you first need to [enable
them](#enabling-experimental-features). In case you experience issues or want to
provide feedback, please [contact us](#feedback-and-support).

## Enabling experimental features

To enable an experimental feature in Memgraph, set `--experimental-enabled` flag
to the appropriate value. For example, to run Memgraph in Docker container with the vector
search feature, run the following command:

```
docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph --experimental-enabled=vector-search
```

To enable multiple experimental features in Memgraph, you need to use the
`--experimental-enabled` flag followed by the name of the feature. Here's how
you can enable the `text-search`
feature at the same time:
To configure Memgraph correctly for each experimental feature, please refer to
the docs for each feature. For example, vector search feature requires
configuring `--experimental-config` flag as well.

`--experimental-enabled=text-search`
To enable more than one experimental feature, list them all, divided by a comma.
For example, to enable vector search and text search features, run Memgraph in
Docker container with the following command:

```
docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph --experimental-enabled=vector-search,text-search
```

If you already have a Memgraph instance with an experimental feature enabled,
and you want to add another one, you will need to restart Memgraph. Ensure that
all the necessary flags are enabled during the restart.

</Callout>
Changing the configuration settings depends on the way you are using Memgraph,
so please refer to the [configuration
docs](/database-management/configuration#changing-configuration) for more
information.

## Experimental features list

### Vector Search
- **Description**: Enables approximate nearest neighbor searches in Memgraph by
leveraging the [HNSW](https://arxiv.org/abs/1603.09320) algorithm to index
node properties as vectors. This feature supports various similarity metrics,
enabling efficient retrieval of nodes based on vector similarity. While the
database itself can operate at any configured isolation level, the vector
index specifically functions at the `READ UNCOMMITTED` isolation level.
**This means that changes to the index made by uncommitted transactions are
immediately visible to other transactions, regardless of the database's
isolation level!**
- **Status**: Experimental. The vector search feature is still under
development and not fully integrated with other Memgraph functionalities.
Unlike other index types, it currently lacks support for durability,
replication, and multi-tenancy. Additionally, vector indexes must be created
using the `--experimental-config` flag, rather than through Cypher queries
like other indexes. Although durability, replication, and multi-tenancy are
not yet supported, you can work around these limitations by ensuring that
every database instance (main or replica) is started with the same
`--experimental-config` flag. This ensures expected behavior for these
features across instances. Vector search is an actively developed feature,
and we are continuously working on improving its capabilities. At the time of
writing, `2024-11-27`, the plan is to make feature stable at the time of the
next release which is `2025-01-29`.
- **Usage**: Enable the feature using the flag
`--experimental-enabled=vector-search`. Configure the feature using
`--experimental-config`. For example:
```shell
--experimental-config='
{
"vector-search": {
"index_name": {
"label": "Node",
"property": "vector",
"dimension": 2,
"capacity": 1000,
"metric": "cos",
"resize_coefficient": 2
}
}
}
'
```
Keep in mind that both flags are required and the configuration requires the
following mandatory fields: `label`, `property`, `dimension` and `capacity`.
- **Documentation**: For more details, check the [vector search
docs](/querying/vector-search).

### Text Search
- **Description**: Enables finding nodes based on property values within
Memgraph, supporting both literal and regex searches, and the concept of
aggregation. It is not guaranteed to work correctly in use cases involving
concurrent transactions and replication.
- **Status**: Experimental. Provides text search functionalities but is not
- **Status**: Early experimental. Provides text search functionalities but is not
guaranteed to work correctly in use cases involving concurrent transactions
and replication. There are no short-term plans to improve text-search
capabilities. If you are interested in improvements, please contact us on
[Discord](https://discord.gg/memgraph).
- **Usage**: Enable the feature using the flag
`--experimental-enabled=text-search`.
- **Documentation**: For more details, visit the [Text Search
documentation](/querying/text-search).
- **Documentation**: For more details, check the [text search
docs](/querying/text-search).


## Providing feedback and getting help
## Feedback and support

If you have questions regarding Memgraph experimental features or want to
provide feedback, [join our Discord community](https://www.discord.gg/memgraph),
or [open an issue in the Memgraph GitHub
repository](https://github.com/memgraph/memgraph/issues).
provide feedback, [join our Discord community](https://www.discord.gg/memgraph)
or [open a discussion on our GitHub](https://github.com/memgraph/memgraph/discussions).

If you prefer a call, schedule a 30-minute session with one of our engineers to
discuss how Memgraph fits with your architecture. Our engineers are highly
Expand Down
1 change: 1 addition & 0 deletions pages/querying/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"expressions": "Expressions",
"schema": "Schema",
"text-search": "Text search",
"vector-search": "Vector search",
"time-to-live": "Time to live",
"query-plan": "Query plan",
"exploring-datasets": "Exploring datasets"
Expand Down
139 changes: 139 additions & 0 deletions pages/querying/vector-search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
title: Vector search
description: Learn how to use vector search and manage vector indices in Memgraph.
---

import { Callout } from 'nextra/components'

# Vector search

Vector search, also known as vector similarity search or nearest neighbor search, is a technique used to find the most similar items in a collection of data based on their vector representations.
The vector search is currently an [experimental
feature](/database-management/experimental-features).
Memgraph implements a READ_UNCOMMITTED isolation level specifically for vector indices. While the main database can operate at any isolation level, the vector index specifically operates at READ_UNCOMMITTED.
This design maintains all transactional guarantees at the database level - only the vector index operations use this relaxed isolation level, ensuring the database's ACID properties remain intact for all other operations.

## Run Memgraph with vector search feature

To try out vector search feature you need to **enable** it and **configure** it.
To enable the feature set the `--experimental-enabled` to `vector-search`. To
configure the feature set the `--experimental-config` flag.

<Callout type="info">
Changing the configuration settings depends on the way you are using Memgraph,
so please refer to the [configuration
docs](/database-management/configuration#changing-configuration) for more
information.
</Callout>

Here is the example configuration:

```shell
--experimental-config=
'
{
"vector-search": {
"index_name": {
"label": "Node",
"property": "vector",
"dimension": 2,
"capacity": 1000,
"metric": "cos",
"resize_coefficient": 2,
}
}
}
'
```

Keep in mind that `--experimental-enabled` and `--experimental-config` flags are
both required and the following fields are mandatory for the configuration:
`label`, `property`, `dimension` and `capacity`.

{<h3> Input: </h3>}

- `index_name: string` - The vector index to be searched.
- `label: string` ➡ The name of the label on which vector index is indexed.
- `property: string` ➡ The name of the property on which vector index is indexed.
- `dimension: int` ➡ The dimension of vectors in the index.
- `capacity: int` ➡ The capacity of the vector index.
- `metric: string` ➡ The metric used for the vector search. The default value is `l2sq`.
- `resize_coefficient: int` ➡The resize coefficient is multiplied by the capacity when the index gets full to determine the new capacity, if possible.
If the index cannot be resized due to insufficient memory, an exception will be thrown. The default value is `2`.


## Usage

Currently, using vector indices is done through vector_search query module.

<Callout type="info">

Unlike other index types, vector indices are not currently utilized by the query planner.

</Callout>

### Show Vector Indices

You can retrieve information about vector indices using `vector_search.show_index_info()` procedure.

{<h3> Output: </h3>}

- `index_name: string` ➡ The name of the vector index.
- `label: string` ➡ The name of the label on which vector index is indexed.
- `property: string` ➡ The name of the property on which vector index is indexed.
- `dimension: int` ➡ The dimension of vectors in the index.
- `capacity: int` ➡ The capacity of the vector index.
- `size: int` ➡ The number of entries in the vector index.

{<h3> Usage: </h3>}

```shell
call vector_search.show_index_info() yield *;
```

### Query vector index

To search for similar vectors within a vector index, use the vector_search.search procedure. This procedure allows you to find the closest vectors to a query vector based on a selected similarity metric..

{<h3> Input: </h3>}

- `index_name: string` - The vector index to search.
- `limit: int` - The number of nearest neighbors to return.
- `search_query: List[float]` - The vector to query in the index..

{<h3> Output: </h3>}

- `node: Vertex` - A node in the vector index matching the given query.
- `distance: double` - The distance from the node to the query..
- `similarity: double` - The similarity of the node and the query.

{<h3> Usage: </h3>}

```shell
call vector_search.search("index_name", 1, [2.0, 2.0]) yield *;
```

### Similarity metrics

The following table lists the supported similarity metrics for vector search. These
metrics determine how similarities between vectors are calculated. Default type
for the metric is `l2sq`.

| Metric | Description |
|-------------|------------------------------------------------------|
| `ip` | Inner product (dot product). |
| `cos` | Cosine similarity. |
| `l2sq` | Squared Euclidean distance. |
| `pearson` | Pearson correlation coefficient. |
| `haversine` | Haversine distance (suitable for geographic data). |
| `divergence`| A divergence-based metric. |
| `hamming` | Hamming distance. |
| `tanimoto` | Tanimoto coefficient. |
| `sorensen` | Sørensen-Dice coefficient. |
| `jaccard` | Jaccard index. |

### Scalar type

Properties are stored as 64-bit values in the property store and as 32-bit values in the vector index.
Scalar type define the data type of each vector element. Default type for the
metric is `f32`.

0 comments on commit 4404193

Please sign in to comment.