feat(memory): document loaders, text splitter, and ingestion pipeline#558
Merged
feat(memory): document loaders, text splitter, and ingestion pipeline#558
Conversation
e5dd76a to
aca6e87
Compare
…line Introduce document loading subsystem in zeph-memory with DocumentLoader trait, TextLoader (txt/md), TextSplitter with sentence-aware chunking, and IngestionPipeline that integrates with Qdrant vector store. Add feature-gated PdfLoader behind `pdf` feature using pdf-extract. Include file size guard (50 MiB) and path canonicalization for security. Closes #469, #470, #471, #472 Refs #478
Match TextLoader pattern with per-instance max_file_size field defaulting to DEFAULT_MAX_FILE_SIZE (50 MiB).
…or ingestion pipeline Add 4 property-based tests for TextSplitter: never panics on arbitrary input, chunks cover all content, indices are sequential, no empty chunks. Add 5 integration tests using testcontainers Qdrant for IngestionPipeline: single/empty/multi-chunk ingest, load_and_ingest from file, payload verification.
Add docs/src/guide/document-loaders.md covering DocumentLoader trait, TextLoader, PdfLoader, TextSplitter, and IngestionPipeline. Update architecture/crates.md, feature-flags.md, SUMMARY.md, zeph-memory README, and root README with pdf feature flag.
Integration tests require Docker (testcontainers) and fail in the snapshot check job which lacks Docker. Limit insta test to --lib --bins since all snapshots are in inline cfg(test) modules.
c1d7c70 to
f37b390
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DocumentLoadertrait withTextLoader(txt/md) and feature-gatedPdfLoaderin zeph-memoryTextSplitterwith configurable chunk size, overlap, and sentence-aware splittingIngestionPipelinefor load -> split -> embed -> store via QdrantIssues
Closes #469,Closes #470,Closes #471,Closes #472
Refs #478
Test plan
--features pdf