-
Notifications
You must be signed in to change notification settings - Fork 263
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Weaviate integration, features include: 1. Initiate the required environment and connect to Weaviate vector database; 2. Create a class; 3. Delete a class; 4. Add data; 5. Make similarity-based queries. --------- Co-authored-by: Andy Xu <xzdandy@gmail.com>
- Loading branch information
1 parent
a323af3
commit 0268df2
Showing
16 changed files
with
248 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
Weaviate | ||
========== | ||
|
||
Weaviate is an open-source vector database designed for scalability and rich querying capabilities. It allows for semantic search, automated vectorization, and supports large language model (LLM) integration. | ||
The connection to Weaviate is based on the `weaviate-client <https://weaviate.io/developers/weaviate/client-libraries/python>`_ library. | ||
|
||
Dependency | ||
---------- | ||
|
||
* weaviate-client | ||
|
||
Parameters | ||
---------- | ||
|
||
To use Weaviate, you need an API key and a URL of your Weaviate instance. Here are the `instructions for setting up a Weaviate instance <https://weaviate.io/developers/weaviate/quickstart>`_. After setting up your instance, you will find the API key and URL on the Details tab in Weaviate Cloud Services (WCS) dashboard. These details are essential for establishing a connection to the Weaviate server. | ||
|
||
* `WEAVIATE_API_KEY` is the API key for your Weaviate instance. | ||
* `WEAVIATE_API_URL` is the URL of your Weaviate instance. | ||
|
||
The above values can either be set via the ``SET`` statement, or in the os environment fields "WEAVIATE_API_KEY", "WEAVIATE_API_URL" | ||
|
||
Create Collection | ||
----------------- | ||
|
||
Weaviate uses collections (similar to 'classes') to store data. To create a collection in Weaviate, use the following SQL command in EvaDB: | ||
|
||
.. code-block:: sql | ||
CREATE INDEX collection_name ON table_name (data) USING WEAVIATE; | ||
This command creates a collection in Weaviate with the specified name, linked to the table in EvaDB. You can also specify vectorizer settings and other configurations for the collection as needed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# coding=utf-8 | ||
# Copyright 2018-2023 EvaDB | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import os | ||
from typing import List | ||
|
||
from evadb.third_party.vector_stores.types import ( | ||
FeaturePayload, | ||
VectorIndexQuery, | ||
VectorIndexQueryResult, | ||
VectorStore, | ||
) | ||
from evadb.utils.generic_utils import try_to_import_weaviate_client | ||
|
||
required_params = [] | ||
_weaviate_init_done = False | ||
|
||
|
||
class WeaviateVectorStore(VectorStore): | ||
def __init__(self, collection_name: str, **kwargs) -> None: | ||
try_to_import_weaviate_client() | ||
global _weaviate_init_done | ||
|
||
self._collection_name = collection_name | ||
|
||
# Get the API key. | ||
self._api_key = kwargs.get("WEAVIATE_API_KEY") | ||
|
||
if not self._api_key: | ||
self._api_key = os.environ.get("WEAVIATE_API_KEY") | ||
|
||
assert ( | ||
self._api_key | ||
), "Please set your `WEAVIATE_API_KEY` using set command or environment variable (WEAVIATE_API_KEY). It can be found at the Details tab in WCS Dashboard." | ||
|
||
# Get the API Url. | ||
self._api_url = kwargs.get("WEAVIATE_API_URL") | ||
|
||
if not self._api_url: | ||
self._api_url = os.environ.get("WEAVIATE_API_URL") | ||
|
||
assert ( | ||
self._api_url | ||
), "Please set your `WEAVIATE_API_URL` using set command or environment variable (WEAVIATE_API_URL). It can be found at the Details tab in WCS Dashboard." | ||
|
||
if not _weaviate_init_done: | ||
# Initialize weaviate client | ||
import weaviate | ||
|
||
client = weaviate.Client( | ||
url=self._api_url, | ||
auth_client_secret=weaviate.AuthApiKey(api_key=self._api_key), | ||
) | ||
client.schema.get() | ||
|
||
_weaviate_init_done = True | ||
|
||
self._client = client | ||
|
||
def create( | ||
self, | ||
vectorizer: str = "text2vec-openai", | ||
properties: list = None, | ||
module_config: dict = None, | ||
): | ||
properties = properties or [] | ||
module_config = module_config or {} | ||
|
||
collection_obj = { | ||
"class": self._collection_name, | ||
"properties": properties, | ||
"vectorizer": vectorizer, | ||
"moduleConfig": module_config, | ||
} | ||
|
||
if self._client.schema.exists(self._collection_name): | ||
self._client.schema.delete_class(self._collection_name) | ||
|
||
self._client.schema.create_class(collection_obj) | ||
|
||
def add(self, payload: List[FeaturePayload]) -> None: | ||
with self._client.batch as batch: | ||
for item in payload: | ||
data_object = {"id": item.id, "vector": item.embedding} | ||
batch.add_data_object(data_object, self._collection_name) | ||
|
||
def delete(self) -> None: | ||
self._client.schema.delete_class(self._collection_name) | ||
|
||
def query(self, query: VectorIndexQuery) -> VectorIndexQueryResult: | ||
response = ( | ||
self._client.query.get(self._collection_name, ["*"]) | ||
.with_near_vector({"vector": query.embedding}) | ||
.with_limit(query.top_k) | ||
.do() | ||
) | ||
|
||
data = response.get("data", {}) | ||
results = data.get("Get", {}).get(self._collection_name, []) | ||
|
||
similarities = [item["_additional"]["distance"] for item in results] | ||
ids = [item["id"] for item in results] | ||
|
||
return VectorIndexQueryResult(similarities, ids) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.