Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] ContainerProperties: Adds Vector Embedding and Indexing Policy #4379

Merged

Conversation

kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Mar 28, 2024

Pull Request Template

Description

This PR adds a new VectorEmbeddingPolicy in ContainerProperties and a new VectorIndexes in the IndexingPolicy to enable Vector Similarity Search in Cosmos DB.

  • Vector Embedding Policy is the part of the Container level policy, which has the details about embedding dimensions, data type of the embeddings, and the distance function to be used to calculate the similarity between the embedding vectors.

  • Vector Indexes in Indexing policy is very similar to that of the composite or spatial indexes, in terms of the specs. It specifies the index path, as well as the new index type.

Below is a sample for the new ContainerProperties with Indexing Policy and Vector Embedding Policy highlighted.

{
    "vectorEmbeddingPolicy": {
        "vectorEmbeddings": [
            {
                "path": "/vector1",
                "dataType": "float32",
                "dimensions": 1200,
                "distanceFunction": "cosine"
            },
            {
                "path": "/vector2",
                "dataType": "int8",
                "dimensions": 3,
                "distanceFunction": "dotproduct"
            },
            {
                "path": "/vector3",
                "dataType": "uint8",
                "dimensions": 400,
                "distanceFunction": "euclidean"
            }
        ]
    },
    "partitionKey": {
        "kind": "Hash",
        "paths": [
            "/pk"
        ]
    },
    "indexingPolicy": {
        "automatic": true,
        "indexingMode": "Consistent",
        "vectorIndexes": [
            {
                "path": "/vector1",
                "type": "flat"
            },
            {
                "path": "/vector2",
                "type": "quantizedFlat"
            },
            {
                "path": "/vector3",
                "type": "diskANN"
            }
        ],
        "includedPaths": [
            {
                "path": "/name/?"
            },
            {
                "path": "/description/?"
            }
        ],
        "excludedPaths": [
            {
                "path": "/*"
            }
        ]
    },
    "uniqueKeyPolicy": {}
}

[Note: Since the public emulator is not ready to support the vector embeddings and indexes yet, some of the emulator tests has been marked as skipped for now. These will be enabled in the future as soon as the emulator changes are available publicly.]

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #4364

@kundadebdatta kundadebdatta self-assigned this Mar 29, 2024
@kundadebdatta kundadebdatta changed the title ContainerProperties: Adds Vector Embedding Policy ContainerProperties: Adds Vector Embedding and Indexing Policy Apr 1, 2024
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kundadebdatta kundadebdatta added the auto-merge Enables automation to merge PRs label Apr 5, 2024
kirankumarkolli
kirankumarkolli previously approved these changes Apr 5, 2024
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kundadebdatta kundadebdatta added auto-merge Enables automation to merge PRs and removed auto-merge Enables automation to merge PRs labels Apr 5, 2024
@ealsur ealsur merged commit d1ff001 into master Apr 5, 2024
21 checks passed
@ealsur ealsur deleted the users/kundadebdatta/4364_vector_index_add_collection_properties branch April 5, 2024 22:39
@kundadebdatta kundadebdatta changed the title ContainerProperties: Adds Vector Embedding and Indexing Policy [Internal] ContainerProperties: Adds Vector Embedding and Indexing Policy Apr 6, 2024
microsoft-github-policy-service bot pushed a commit that referenced this pull request Oct 24, 2024
…nterfaces to Mark Them as Public for GA (#4845)

# Pull Request Template

## Description

The purpose of this PR is to mark the new `VectorEmbeddingPolicy` in the
`ContainerProperties` as a public surface interface for `GA` release,
and introducing new `VectorIndexes` in the `IndexingPolicy` to enable
Vector Similarity Search in Cosmos DB ecosystem.

Relevant PRs for the vector similarity work: 

- [ContainerProperties: Adds Vector Embedding and Indexing
Policy](#4379)
- [ContainerProperties: Refactors Vector Embedding and Indexing Policy
Interfaces to Mark Them as Public for
Preview](#4486)
- [VectorIndexDefinition: Adds Support for Partitioned
DiskANN](#4792)

## Type of change

Please delete options that are not relevant.

- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #4825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Vector Similarity: Add Relevant Collection/ Container Properties
6 participants