Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
RUSTC_WRAPPER: sccache
SCCACHE_GHA_ENABLED: "true"
- name: Check snapshots
run: cargo insta test --workspace --features full --check
run: cargo insta test --workspace --features full --check --lib --bins
env:
RUSTC_WRAPPER: sccache
SCCACHE_GHA_ENABLED: "true"
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
- `ScoredMatch` struct exposing both skill index and cosine similarity score from matcher backends
- `IntentClassification` type (`skill_name`, `confidence`, `params`) with `JsonSchema` derive for schema-enforced LLM responses
- `disambiguation_threshold` in `[skills]` config section (default: 0.05) with `with_disambiguation_threshold()` builder on `Agent`
- DocumentLoader trait with text/markdown file loader in zeph-memory (#469)
- Text splitter with configurable chunk size, overlap, and sentence-aware splitting (#470)
- PDF document loader, feature-gated behind `pdf` (#471)
- Document ingestion pipeline: load, split, embed, store via Qdrant (#472)
- File size guard (50 MiB default) and path canonicalization for document loaders

## [0.10.0] - 2026-02-18

Expand Down
97 changes: 92 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ futures-core = "0.3"
notify = "8"
notify-debouncer-mini = "0.7"
ollama-rs = { version = "0.3", default-features = false, features = ["rustls", "stream"] }
pdf-extract = "0.7"
proptest = "1.6"
pulldown-cmark = "0.13"
qdrant-client = { version = "1.16", default-features = false }
ratatui = "0.30"
Expand Down Expand Up @@ -118,6 +120,7 @@ gateway = ["dep:zeph-gateway"]
daemon = ["zeph-core/daemon"]
scheduler = ["dep:zeph-scheduler"]
otel = ["dep:opentelemetry", "dep:opentelemetry_sdk", "dep:opentelemetry-otlp", "dep:tracing-opentelemetry"]
pdf = ["zeph-memory/pdf"]
mock = ["zeph-llm/mock", "zeph-memory/mock"]

[dependencies]
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ Always compiled in: `openai`, `compatible`, `orchestrator`, `router`, `self-lear
| `index` | AST-based code indexing |
| `gateway` | HTTP webhook ingestion |
| `daemon` | Component supervisor |
| `pdf` | PDF document loading for RAG |
| `scheduler` | Cron-based periodic tasks |
| `otel` | OpenTelemetry OTLP export |
| `full` | Everything above |
Expand Down
12 changes: 8 additions & 4 deletions crates/zeph-memory/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ license.workspace = true
repository.workspace = true

[dependencies]
pdf-extract = { workspace = true, optional = true }
qdrant-client = { workspace = true, features = ["serde"] }
serde_json.workspace = true
sqlx = { workspace = true, features = ["runtime-tokio", "sqlite", "migrate"] }
Expand All @@ -16,17 +17,20 @@ tracing.workspace = true
uuid = { workspace = true, features = ["v4"] }
zeph-llm.workspace = true

[[bench]]
name = "token_estimation"
harness = false

[features]
default = []
mock = []
pdf = ["dep:pdf-extract"]

[[bench]]
name = "token_estimation"
harness = false

[dev-dependencies]
anyhow.workspace = true
criterion.workspace = true
proptest.workspace = true
tempfile.workspace = true
testcontainers.workspace = true
tokio = { workspace = true, features = ["macros", "rt-multi-thread"] }
tokio-stream.workspace = true
Expand Down
20 changes: 19 additions & 1 deletion crates/zeph-memory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ SQLite-backed conversation persistence with Qdrant vector search.

Provides durable conversation storage via SQLite and semantic retrieval through Qdrant vector search. The `SemanticMemory` orchestrator combines both backends, enabling the agent to recall relevant context from past conversations using embedding similarity.

Includes a document ingestion subsystem for loading, chunking, and storing user documents (text, Markdown, PDF) into Qdrant for RAG workflows.

## Key modules

| Module | Description |
Expand All @@ -14,16 +16,32 @@ Provides durable conversation storage via SQLite and semantic retrieval through
| `qdrant` | Qdrant client for vector upsert and search |
| `qdrant_ops` | `QdrantOps` — high-level Qdrant operations |
| `semantic` | `SemanticMemory` — orchestrates SQLite + Qdrant |
| `document` | Document loading, splitting, and ingestion pipeline |
| `document::loader` | `TextLoader` (.txt/.md), `PdfLoader` (feature-gated: `pdf`) |
| `document::splitter` | `TextSplitter` with configurable chunking |
| `document::pipeline` | `IngestionPipeline` — load, split, embed, store via Qdrant |
| `vector_store` | `VectorStore` trait and `VectorPoint` types |
| `embedding_store` | `EmbeddingStore` — high-level embedding CRUD |
| `types` | `ConversationId`, `MessageId`, shared types |
| `error` | `MemoryError` — unified error type |

**Re-exports:** `MemoryError`, `QdrantOps`, `ConversationId`, `MessageId`
**Re-exports:** `MemoryError`, `QdrantOps`, `ConversationId`, `MessageId`, `Document`, `DocumentLoader`, `TextLoader`, `TextSplitter`, `IngestionPipeline`, `Chunk`, `SplitterConfig`, `DocumentError`, `DocumentMetadata`, `PdfLoader` (behind `pdf` feature)

## Features

| Feature | Description |
|---------|-------------|
| `pdf` | PDF document loading via `pdf-extract` |
| `mock` | In-memory `VectorStore` implementation for testing |

## Usage

```toml
[dependencies]
zeph-memory = { path = "../zeph-memory" }

# With PDF support
zeph-memory = { path = "../zeph-memory", features = ["pdf"] }
```

## License
Expand Down
21 changes: 21 additions & 0 deletions crates/zeph-memory/src/document/error.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#[derive(Debug, thiserror::Error)]
pub enum DocumentError {
#[error("IO error: {0}")]
Io(#[from] std::io::Error),

#[error("unsupported format: {0}")]
UnsupportedFormat(String),

#[error("file too large: {0} bytes")]
FileTooLarge(u64),

#[cfg(feature = "pdf")]
#[error("PDF error: {0}")]
Pdf(String),

#[error("embedding failed: {0}")]
Embedding(#[from] zeph_llm::LlmError),

#[error("storage error: {0}")]
Storage(#[from] crate::error::MemoryError),
}
7 changes: 7 additions & 0 deletions crates/zeph-memory/src/document/loader/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
mod text;
pub use text::TextLoader;

#[cfg(feature = "pdf")]
mod pdf;
#[cfg(feature = "pdf")]
pub use pdf::PdfLoader;
Loading
Loading