Skip to content

Commit 68b3a34

Browse files
authored
feat(nodestore): add Postgres for the doc and index store (#1706)
* Adding Postgres for the doc and index store * Adding documentation. Rename postgres database local->simple. Postgres storage dependencies * Update documentation for postgres storage * Renaming feature to nodestore * update docstore -> nodestore in doc * missed some docstore changes in doc * Updated poetry.lock * Formatting updates to pass ruff/black checks * Correction to unreachable code! * Format adjustment to pass black test * Adjust extra inclusion name for vector pg * extra dep change for pg vector * storage-postgres -> storage-nodestore-postgres * Hash change on poetry lock
1 parent d17c34e commit 68b3a34

File tree

9 files changed

+226
-26
lines changed

9 files changed

+226
-26
lines changed

fern/docs.yml

+2
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ navigation:
5858
contents:
5959
- page: Vector Stores
6060
path: ./docs/pages/manual/vectordb.mdx
61+
- page: Node Stores
62+
path: ./docs/pages/manual/nodestore.mdx
6163
- section: Advanced Setup
6264
contents:
6365
- page: LLM Backends

fern/docs/pages/manual/nodestore.mdx

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
## NodeStores
2+
PrivateGPT supports **Simple** and [Postgres](https://www.postgresql.org/) providers. Simple being the default.
3+
4+
In order to select one or the other, set the `nodestore.database` property in the `settings.yaml` file to `simple` or `postgres`.
5+
6+
```yaml
7+
nodestore:
8+
database: simple
9+
```
10+
11+
### Simple Document Store
12+
13+
Setting up simple document store: Persist data with in-memory and disk storage.
14+
15+
Enabling the simple document store is an excellent choice for small projects or proofs of concept where you need to persist data while maintaining minimal setup complexity. To get started, set the nodestore.database property in your settings.yaml file as follows:
16+
17+
```yaml
18+
nodestore:
19+
database: simple
20+
```
21+
The beauty of the simple document store is its flexibility and ease of implementation. It provides a solid foundation for managing and retrieving data without the need for complex setup or configuration. The combination of in-memory processing and disk persistence ensures that you can efficiently handle small to medium-sized datasets while maintaining data consistency across runs.
22+
23+
### Postgres Document Store
24+
25+
To enable Postgres, set the `nodestore.database` property in the `settings.yaml` file to `postgres` and install the `storage-nodestore-postgres` extra. Note: Vector Embeddings Storage in Postgres is configured separately
26+
27+
```bash
28+
poetry install --extras storage-nodestore-postgres
29+
```
30+
31+
The available configuration options are:
32+
| Field | Description |
33+
|---------------|-----------------------------------------------------------|
34+
| **host** | The server hosting the Postgres database. Default is `localhost` |
35+
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
36+
| **database** | The specific database to connect to. Default is `postgres` |
37+
| **user** | The username for database access. Default is `postgres` |
38+
| **password** | The password for database access. (Required) |
39+
| **schema_name** | The database schema to use. Default is `private_gpt` |
40+
41+
For example:
42+
```yaml
43+
nodestore:
44+
database: postgres
45+
46+
postgres:
47+
host: localhost
48+
port: 5432
49+
database: postgres
50+
user: postgres
51+
password: <PASSWORD>
52+
schema_name: private_gpt
53+
```
54+
55+
Given the above configuration, Two PostgreSQL tables will be created upon successful connection: one for storing metadata related to the index and another for document data itself.
56+
57+
```
58+
postgres=# \dt private_gpt.*
59+
List of relations
60+
Schema | Name | Type | Owner
61+
-------------+-----------------+-------+--------------
62+
private_gpt | data_docstore | table | postgres
63+
private_gpt | data_indexstore | table | postgres
64+
65+
postgres=#
66+
```

fern/docs/pages/manual/vectordb.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,10 @@ By default `chroma` will use a disk-based database stored in local_data_path / "
5151

5252
### PGVector
5353

54-
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `pgvector` and install the `pgvector` extra.
54+
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `pgvector` and install the `vector-stores-postgres` extra.
5555

5656
```bash
57-
poetry install --extras pgvector
57+
poetry install --extras vector-stores-postgres
5858
```
5959

6060
PGVector settings can be configured by setting values to the `pgvector` property in the `settings.yaml` file.

poetry.lock

+32-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

private_gpt/components/node_store/node_store_component.py

+49-16
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from llama_index.core.storage.index_store.types import BaseIndexStore
77

88
from private_gpt.paths import local_data_path
9+
from private_gpt.settings.settings import Settings
910

1011
logger = logging.getLogger(__name__)
1112

@@ -16,19 +17,51 @@ class NodeStoreComponent:
1617
doc_store: BaseDocumentStore
1718

1819
@inject
19-
def __init__(self) -> None:
20-
try:
21-
self.index_store = SimpleIndexStore.from_persist_dir(
22-
persist_dir=str(local_data_path)
23-
)
24-
except FileNotFoundError:
25-
logger.debug("Local index store not found, creating a new one")
26-
self.index_store = SimpleIndexStore()
27-
28-
try:
29-
self.doc_store = SimpleDocumentStore.from_persist_dir(
30-
persist_dir=str(local_data_path)
31-
)
32-
except FileNotFoundError:
33-
logger.debug("Local document store not found, creating a new one")
34-
self.doc_store = SimpleDocumentStore()
20+
def __init__(self, settings: Settings) -> None:
21+
match settings.nodestore.database:
22+
case "simple":
23+
try:
24+
self.index_store = SimpleIndexStore.from_persist_dir(
25+
persist_dir=str(local_data_path)
26+
)
27+
except FileNotFoundError:
28+
logger.debug("Local index store not found, creating a new one")
29+
self.index_store = SimpleIndexStore()
30+
31+
try:
32+
self.doc_store = SimpleDocumentStore.from_persist_dir(
33+
persist_dir=str(local_data_path)
34+
)
35+
except FileNotFoundError:
36+
logger.debug("Local document store not found, creating a new one")
37+
self.doc_store = SimpleDocumentStore()
38+
39+
case "postgres":
40+
try:
41+
from llama_index.core.storage.docstore.postgres_docstore import (
42+
PostgresDocumentStore,
43+
)
44+
from llama_index.core.storage.index_store.postgres_index_store import (
45+
PostgresIndexStore,
46+
)
47+
except ImportError:
48+
raise ImportError(
49+
"Postgres dependencies not found, install with `poetry install --extras storage-nodestore-postgres`"
50+
) from None
51+
52+
if settings.postgres is None:
53+
raise ValueError("Postgres index/doc store settings not found.")
54+
55+
self.index_store = PostgresIndexStore.from_params(
56+
**settings.postgres.model_dump(exclude_none=True)
57+
)
58+
self.doc_store = PostgresDocumentStore.from_params(
59+
**settings.postgres.model_dump(exclude_none=True)
60+
)
61+
62+
case _:
63+
# Should be unreachable
64+
# The settings validator should have caught this
65+
raise ValueError(
66+
f"Database {settings.nodestore.database} not supported"
67+
)

private_gpt/settings/settings.py

+14-5
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,10 @@ class VectorstoreSettings(BaseModel):
108108
database: Literal["chroma", "qdrant", "pgvector"]
109109

110110

111+
class NodeStoreSettings(BaseModel):
112+
database: Literal["simple", "postgres"]
113+
114+
111115
class LlamaCPPSettings(BaseModel):
112116
llm_hf_repo_id: str
113117
llm_hf_model_file: str
@@ -249,7 +253,7 @@ class UISettings(BaseModel):
249253
)
250254

251255

252-
class PGVectorSettings(BaseModel):
256+
class PostgresSettings(BaseModel):
253257
host: str = Field(
254258
"localhost",
255259
description="The server hosting the Postgres database",
@@ -270,14 +274,17 @@ class PGVectorSettings(BaseModel):
270274
"postgres",
271275
description="The database to use to connect to the Postgres database",
272276
)
277+
schema_name: str = Field(
278+
"public",
279+
description="The name of the schema in the Postgres database to use",
280+
)
281+
282+
283+
class PGVectorSettings(PostgresSettings):
273284
embed_dim: int = Field(
274285
384,
275286
description="The dimension of the embeddings stored in the Postgres database",
276287
)
277-
schema_name: str = Field(
278-
"public",
279-
description="The name of the schema in the Postgres database where the embeddings are stored",
280-
)
281288
table_name: str = Field(
282289
"embeddings",
283290
description="The name of the table in the Postgres database where the embeddings are stored",
@@ -350,7 +357,9 @@ class Settings(BaseModel):
350357
openai: OpenAISettings
351358
ollama: OllamaSettings
352359
vectorstore: VectorstoreSettings
360+
nodestore: NodeStoreSettings
353361
qdrant: QdrantSettings | None = None
362+
postgres: PostgresSettings | None = None
354363
pgvector: PGVectorSettings | None = None
355364

356365

pyproject.toml

+7-1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
2727
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
2828
llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
2929
llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
30+
llama-index-storage-docstore-postgres = {version ="^0.1.2", optional = true}
31+
llama-index-storage-index-store-postgres = {version ="^0.1.2", optional = true}
32+
# Postgres
33+
psycopg2-binary = {version ="^2.9.9", optional = true}
34+
asyncpg = {version="^0.29.0", optional = true}
35+
3036
# Optional Sagemaker dependency
3137
boto3 = {version ="^1.34.51", optional = true}
3238
# Optional UI
@@ -46,7 +52,7 @@ embeddings-sagemaker = ["boto3"]
4652
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
4753
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
4854
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
49-
55+
storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-index-storage-index-store-postgres","psycopg2-binary","asyncpg"]
5056

5157
[tool.poetry.group.dev.dependencies]
5258
black = "^22"

settings-ollama-pg.yaml

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Using ollama and postgres for the vector, doc and index store. Ollama is also used for embeddings.
2+
# To use install these extras:
3+
# poetry install --extras "llms-ollama ui vector-stores-postgres embeddings-ollama storage-nodestore-postgres"
4+
server:
5+
env_name: ${APP_ENV:ollama}
6+
7+
llm:
8+
mode: ollama
9+
max_new_tokens: 512
10+
context_window: 3900
11+
12+
embedding:
13+
mode: ollama
14+
15+
ollama:
16+
llm_model: mistral
17+
embedding_model: nomic-embed-text
18+
api_base: http://localhost:11434
19+
20+
nodestore:
21+
database: postgres
22+
23+
vectorstore:
24+
database: pgvector
25+
26+
pgvector:
27+
host: localhost
28+
port: 5432
29+
database: postgres
30+
user: postgres
31+
password: admin
32+
embed_dim: 768
33+
schema_name: private_gpt
34+
table_name: embeddings
35+
36+
postgres:
37+
host: localhost
38+
port: 5432
39+
database: postgres
40+
user: postgres
41+
password: admin
42+
schema_name: private_gpt
43+

settings.yaml

+11
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,9 @@ huggingface:
6262
vectorstore:
6363
database: qdrant
6464

65+
nodestore:
66+
database: simple
67+
6568
qdrant:
6669
path: local_data/private_gpt/qdrant
6770

@@ -75,6 +78,14 @@ pgvector:
7578
schema_name: private_gpt
7679
table_name: embeddings
7780

81+
postgres:
82+
host: localhost
83+
port: 5432
84+
database: postgres
85+
user: postgres
86+
password: postgres
87+
schema_name: private_gpt
88+
7889
sagemaker:
7990
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
8091
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479

0 commit comments

Comments
 (0)