Skip to content

A command line tool for managing vector database and embeddings.

License

Notifications You must be signed in to change notification settings

jjzhuo/vector-db-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vector Database Management System

Description

This is a command line tool for managing vector embeddings database. It uses Langchain and chroma to manage the data ingestion, storage and retrieval.

Features

  • Creating and managing indices of document embeddings
  • Searching documents based on similarity
  • Retrieving embedded documents
  • Estimate the cost of embedding a document
  • and more...

Installation

  1. Clone the repository:

    git clone <repository_url>
    
  2. Navigate to the repository folder:

    cd <repository_folder>
    
  3. Install the required Python packages:

    pip install -r requirements.txt
    
  4. Set up the environment variable VDB_DIR. This should be the directory where your vector databases are stored. You can do this in your shell's configuration file (e.g., .bashrc, .bash_profile, or .zshrc), or you can set it in your script before running the python file:

    In Bash:

    export VDB_DIR=/path/to/your/directory
    

    In the Python script:

    import os
    os.environ["VDB_DIR"] = "/path/to/your/directory"

Commands

list-indices

Command:

python main.py list-indices

Description:

Lists all the indices stored in the directory specified by the VDB_DIR environment variable.


create-index

Command:

python main.py create-index <index_name> <input_file>

Description:

Creates an index named <index_name> from the text file <input_file>. Text from the file is split into chunks, and an embedding is created for each chunk. The embeddings are then stored in the new index.


describe-index

Command:

python main.py describe-index <index_name>

Description:

Describes the specified index. Prints the total number of documents in the index and the set of unique sources in the metadata of the indexed documents.


contents

Command:

python main.py contents <index_name>

Description:

Prints the content of the specified index, including both the text and metadata of each indexed document.


search-similarity

Command:

python main.py search-similarity <index_name> <query>

Description:

Searches the specified index for documents that are similar to the provided query. Prints the content of each found document.


chat

Command:

python main.py chat <index_name> <query> [--temperature=<temperature>] [--model=<model>]

Description:

Q & A with GPT using the relevant information from the index.


search-keyword

Command:

python main.py search-keyword <index_name> <keyword>

Description:

Searches the specified index for documents that contain the provided keyword. Prints the content of each found document.


insert-text

Command:

python main.py insert-text <index_name> <input_file> [--chunk_size=<chunk_size>]

Description:

Inserts text from the file <input_file> into the specified index. The text is split into chunks (with size specified by the chunk_size option), an embedding is created for each chunk, and these embeddings are then added to the index.


remove-text

Command:

python main.py remove-text <index_name> <id>

Description:

Removes the document with the specified ID from the index.


estimate-cost

Command:

python main.py estimate-cost <input_file>

Description:

Estimates the cost of embedding the text from the file <input_file>. Prints the estimated cost.

Contributions

Contributions are welcome! Please create a pull request with your changes.

About

A command line tool for managing vector database and embeddings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages