Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Commit

Permalink
chore: docs
Browse files Browse the repository at this point in the history
  • Loading branch information
iwilltry42 committed Oct 23, 2024
1 parent 95e0ca6 commit e7b8cd3
Show file tree
Hide file tree
Showing 16 changed files with 150 additions and 32 deletions.
10 changes: 4 additions & 6 deletions docs/docs/03-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,10 @@ You can make use of it in the CLI by setting the `KNOW_SERVER_URL` environment v
## 3. Index Database

The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships.
It enables some extra convenience features but does not store the actual data (embeddings).
The current implementation uses **SQLite**.
It's fully embedded and does not require any additional setup.
It enables some extra convenience features but does not store the actual data (content & embeddings).
The current implementation uses **SQLite** by default, which is fully embedded and does not require any additional setup.

## 4. Vector Database

The vector database is the main storage for the embeddings of the ingested documents along with some metadata (e.g. source file information).
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go).
It's fully embedded and does not require any additional setup.
The vector database is the main storage for the content and embeddings of the ingested documents along with some metadata (e.g. source file information).
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go) by default, which is fully embedded and does not require any additional setup.
30 changes: 30 additions & 0 deletions docs/docs/06-databases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Index & Vector Databases
---

# Index & Vector Databases

## Index Database

The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships.
It enables some extra convenience features but does not store the actual data (content & embeddings).
The current implementation uses **SQLite** by default, which is fully embedded and does not require any additional setup.

You can configure it by setting a database connection string via the `KNOW_INDEX_DSN` environment variable.
The following options are available:

- [SQLite](https://www.sqlite.org/) (default): `KNOW_INDEX_DSN="sqlite:///home/me/mysqlite.db"`
- [Postgres](https://www.postgresql.org/): `KNOW_INDEX_DSN="postgres://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"`


## Vector Database

The vector database is the main storage for the content and embeddings of the ingested documents along with some metadata (e.g. source file information).
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go) by default, which is fully embedded and does not require any additional setup.

You can configure it by setting a database connection string via the `KNOW_VECTOR_DSN` environment variable.
The following options are available:

- [Chromem-Go](https://github.com/philippgille/chromem-go) (default): `KNOW_VECTOR_DSN="chromem:///path/to/directory"` (Note: we're using a customized fork of chromem-go, so some details may differ from the original project)
- [PGVector](https://github.com/pgvector/pgvector): `KNOW_VECTOR_DSN="pgvector://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"`
- [SQLite-Vec](https://github.com/asg017/sqlite-vec): `KNOW_VECTOR_DSN="sqlite-vec:///home/me/mysqlite.db"`
3 changes: 3 additions & 0 deletions docs/docs/99-cmd/knowledge.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,15 @@ knowledge [flags]
* [knowledge askdir](knowledge_askdir.md) - Retrieve sources for a query from a dataset generated from a directory
* [knowledge create-dataset](knowledge_create-dataset.md) - Create a new dataset
* [knowledge delete-dataset](knowledge_delete-dataset.md) - Delete a dataset
* [knowledge delete-file](knowledge_delete-file.md) - Delete a file from a dataset
* [knowledge edit-dataset](knowledge_edit-dataset.md) - Edit an existing dataset
* [knowledge export](knowledge_export.md) - Export one or more datasets as an archive (zip)
* [knowledge get-dataset](knowledge_get-dataset.md) - Get a dataset
* [knowledge get-file](knowledge_get-file.md) - Get a file from a dataset
* [knowledge import](knowledge_import.md) - Import one or more datasets from an archive (zip) (default: all datasets)
* [knowledge ingest](knowledge_ingest.md) - Ingest a file/directory into a dataset
* [knowledge list-datasets](knowledge_list-datasets.md) - List existing datasets
* [knowledge load](knowledge_load.md) - Load a file and transform it to markdown
* [knowledge retrieve](knowledge_retrieve.md) - Retrieve sources for a query from a dataset
* [knowledge version](knowledge_version.md) -

10 changes: 7 additions & 3 deletions docs/docs/99-cmd/knowledge_askdir.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,24 @@ knowledge askdir [--path <path>] <query> [flags]
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
--concurrency int Number of concurrent ingestion processes ($KNOW_INGEST_CONCURRENCY) (default 10)
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--dedupe-func string Name of the deduplication function to use ($KNOW_INGEST_DEDUPE_FUNC)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
--err-on-unsupported-file Error on unsupported file types ($KNOW_INGEST_ERR_ON_UNSUPPORTED_FILE)
--flow string Flow name ($KNOW_FLOW)
--flows-file string Path to a YAML/JSON file containing ingestion/retrieval flows ($KNOW_FLOWS_FILE)
--flows-file string Path to a YAML/JSON file containing ingestion/retrieval flows ($KNOW_FLOWS_FILE) (default "blueprint:default")
-h, --help help for askdir
--ignore-extensions string Comma-separated list of file extensions to ignore ($KNOW_INGEST_IGNORE_EXTENSIONS)
--ignore-file string Path to a .gitignore style file ($KNOW_INGEST_IGNORE_FILE)
--include-hidden Include hidden files and directories ($KNOW_INGEST_INCLUDE_HIDDEN)
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
-w, --keyword strings Keywords that retrieved documents must contain ($KNOW_RETRIEVE_KEYWORDS)
--no-create-dataset Do NOT create the dataset if it doesn't exist ($KNOW_INGEST_NO_CREATE_DATASET)
--no-prune Do not prune deleted files ($KNOW_ASKDIR_NO_PRUNE)
--no-recursive Don't recursively ingest directories ($KNOW_NO_INGEST_RECURSIVE)
-p, --path string Path to the directory to query ($KNOWLEDGE_CLIENT_ASK_DIR_PATH) (default ".")
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
-k, --top-k int Number of sources to retrieve ($KNOWLEDGE_CLIENT_ASK_DIR_TOP_K) (default 10)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_create-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ knowledge create-dataset <dataset-id> [flags]
```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for create-dataset
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_delete-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ knowledge delete-dataset <dataset-id> [flags]
```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for delete-dataset
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
28 changes: 28 additions & 0 deletions docs/docs/99-cmd/knowledge_delete-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "knowledge delete-file"
---
## knowledge delete-file

Delete a file from a dataset

```
knowledge delete-file <file-id|file-abs-path> [flags]
```

### Options

```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
-d, --dataset string Target Dataset ID ($KNOWLEDGE_CLIENT_DELETE_FILE_DATASET) (default "default")
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for delete-file
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO

* [knowledge](knowledge.md) -

4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_edit-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ knowledge edit-dataset <dataset-id> [flags]
```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for edit-dataset
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--replace-metadata strings replace metadata with key-value pairs (existing metadata will be removed) ($KNOWLEDGE_CLIENT_EDIT_DATASET_REPLACE_METADATA)
--reset-metadata reset metadata to default (empty) ($KNOWLEDGE_CLIENT_EDIT_DATASET_RESET_METADATA)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--update-metadata strings update metadata key-value pairs (existing metadata will be updated/preserved) ($KNOWLEDGE_CLIENT_EDIT_DATASET_UPDATE_METADATA)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_export.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ knowledge export <dataset-id> [<dataset-id>...] [flags]
-a, --all Export all datasets ($KNOWLEDGE_CLIENT_EXPORT_DATASETS_ALL)
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for export
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--output string Output path ($KNOWLEDGE_CLIENT_EXPORT_DATASETS_OUTPUT) (default ".")
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_get-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ knowledge get-dataset <dataset-id> [flags]
--archive string Path to the archive file ($KNOWLEDGE_CLIENT_GET_DATASET_ARCHIVE)
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for get-dataset
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--no-docs Do not include documents in output (way less verbose) ($KNOWLEDGE_CLIENT_GET_DATASET_NO_DOCS)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
28 changes: 28 additions & 0 deletions docs/docs/99-cmd/knowledge_get-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "knowledge get-file"
---
## knowledge get-file

Get a file from a dataset

```
knowledge get-file <file-id|file-abs-path> [flags]
```

### Options

```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
-d, --dataset string Target Dataset ID ($KNOWLEDGE_CLIENT_GET_FILE_DATASET) (default "default")
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for get-file
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO

* [knowledge](knowledge.md) -

4 changes: 2 additions & 2 deletions docs/docs/99-cmd/knowledge_import.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ knowledge import <path> [<dataset-id>...] [flags]
```
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
-h, --help help for import
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
```

### SEE ALSO
Expand Down
Loading

0 comments on commit e7b8cd3

Please sign in to comment.