Commit 4dbd1e5
* feat: enhance VectorStore base class with connection management and health checks
Addresses Issue #212 - Enhanced VectorStore abstract base class
This commit adds production-ready enhancements to the VectorStore base class:
New Features:
- Connection Management: connect(), disconnect(), is_connected property, connection_context()
- Health Check: health_check() with timeout and error handling
- Collection Statistics: get_collection_stats() with implementation delegation
- Common Utilities: _collection_exists(), _batch_chunks(), _validate_collection_config()
Implementation Details:
- Uses Pydantic models from PR #571 (CollectionConfig, VectorDBResponse, etc.)
- Abstract methods for implementation-specific behavior (_health_check_impl, _get_collection_stats_impl)
- Consistent error handling with VectorDBResponse
- Full backward compatibility - no breaking changes to existing abstract methods
Type Aliases Added to data_types.py:
- HealthCheckResponse = VectorDBResponse[dict[str, Any]]
- CollectionStatsResponse = VectorDBResponse[dict[str, Any]]
Benefits:
- Reduces code duplication across all vector store implementations
- Provides consistent error handling patterns
- Enables easy testing via mock implementations
- Sets foundation for Issues #577 (MilvusStore) and #578 (other stores)
Related:
- Supersedes PR #575 (conflicted version)
- Builds on: PR #571 (Pydantic models)
- Enables: Issue #577, #578 (store integrations)
Signed-off-by: manavgup <manavg@gmail.com>
* chore: remove temporary analysis files from PR
Signed-off-by: manavgup <manavg@gmail.com>
* fix: address PR review feedback for vector_store.py enhancements
Addresses review comments from PR #579:
1. Removed abstract decorators from _health_check_impl and _get_collection_stats_impl
- Added default implementations to avoid breaking existing vector stores
- _health_check_impl returns basic connection status
- _get_collection_stats_impl raises NotImplementedError with clear message
2. Enhanced error handling in _collection_exists method
- Only catches CollectionError for expected missing collections
- Logs and re-raises unexpected errors (network, timeout, permissions)
- Handles NotImplementedError when subclass hasn't overridden method
- Prevents silent error swallowing
3. Made dimension validation strict
- Changed from warning to ValueError when dimensions mismatch
- Prevents insertion failures due to embedding dimension incompatibility
- Provides detailed error message with collection name
4. Enhanced all docstrings with comprehensive details
- Added full context for connection management methods
- Included examples for all public utility methods
- Documented return types, exceptions, and common use cases
- Added warnings about default implementation limitations
5. Improved logging throughout
- Added error_type to metadata for better debugging
- Enhanced error messages with context (collection names, dimensions)
- Used exc_info=True for exception stack traces
- Added debug logging for successful operations
All linting checks pass:
- Ruff: All checks passed
- Ruff format: Properly formatted
- Pylint: 10.00/10
- MyPy: Type checking passed (vector_store.py only)
- Pydocstyle: Docstrings compliant
Signed-off-by: manavgup <manavg@gmail.com>
* fix: address additional PR review feedback (#579)
Addresses remaining critical issues from PR review:
1. Fixed Connection Context Manager (Issue #2)
- Only disconnects connections that IT created
- Preserves existing connections on exit
- Added comprehensive docstrings with examples
- Prevents unexpected disconnections
2. Improved Return Type Hints (Issue #3)
- Changed get_collection_stats() to return CollectionStatsResponse
- Added CollectionStatsResponse to imports
- Better type checking and auto-completion
3. Enhanced Error Handling (Issue #4)
- Specific exception handlers for VectorStoreError, CollectionError, TimeoutError
- Uses logger.exception() for unexpected errors
- Clear error categorization in metadata
- Improved logging with exc_info=True
4. Added Comprehensive Unit Tests (Issue #7)
- 24 test cases covering all base class functionality
- Tests connection lifecycle management
- Tests health checks and collection stats
- Tests utility methods (_batch_chunks, _collection_exists, _validate_collection_config)
- All tests passing
Test Results:
- 24/24 tests passing
- Connection context manager tested with/without existing connections
- Error handling tested for expected and unexpected errors
- Utility methods tested with valid/invalid inputs
All linting checks pass:
- Ruff: All checks passed
- Tests: 24/24 passing
Signed-off-by: manavgup <manavg@gmail.com>
* fix: address final PR review recommendations (#579)
Addresses remaining issues from PR review comment:
1. Fixed Async Context Manager (CRITICAL - Issue #1)
- Added warning that default calls sync connect/disconnect
- Documented that implementations with async clients should override
- Included example of proper async implementation
- Prevents event loop blocking for async implementations
2. Updated Connection Logging (Issue #2)
- Changed from INFO to DEBUG level for connect/disconnect
- Added class name to log messages for clarity
- Emphasizes these are flag-only operations (not real connections)
- Reduces log noise for base implementation
3. Added Batch Size Upper Bound Warning (Issue #4)
- Warns when batch_size > 10,000 (memory concerns)
- Recommends 100-1000 for optimal performance
- Helps prevent out-of-memory errors
- Non-blocking (warning only, not error)
4. Documented Timeout Limitation (Issue #3)
- Clarified that default implementation doesn't enforce timeout
- Added example with signal-based timeout enforcement (Unix)
- Added simple example without timeout
- Subclasses can choose appropriate timeout strategy
All changes are backward compatible and non-breaking.
Test Results:
- 24/24 tests passing
- All linting checks pass (Ruff, format)
Signed-off-by: manavgup <manavg@gmail.com>
* fix(vectordb): Address final PR #579 review recommendations
This commit addresses the remaining review items from PR #579:
1. CRITICAL: Async context manager event loop blocking
- Strengthened warning with ALL CAPS emphasis
- Added detailed example showing proper async override
- Clarified that sync connect()/disconnect() WILL block event loop
- Documented that implementations with async I/O MUST override
2. MEDIUM: Invalid attribute access in _batch_chunks warning
- Fixed invalid getattr(chunks[0], "collection_name", "unknown")
- Changed to use len(chunks) which is always available
- EmbeddedChunk objects don't have collection_name attribute
3. MEDIUM: _collection_exists error handling
- Changed to raise NotImplementedError instead of returning False
- Added clear error message when _get_collection_stats_impl not implemented
- Prevents silent masking of implementation gaps
- Updated test to expect NotImplementedError
4. MINOR: Enhanced _validate_collection_config
- Clarified Pydantic already validates empty strings and non-positive dimensions
- Removed redundant validation checks
- Kept whitespace-only collection name check (Pydantic doesn't catch this)
- Updated docstring to explain Pydantic vs custom validation
- Added test for whitespace-only collection names
All 26 unit tests passing. Ruff formatting and linting checks pass.
Refs: #579
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Signed-off-by: manavgup <manavg@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
1 parent 1d3f460 commit 4dbd1e5
File tree
3 files changed
+1043
-5
lines changed- backend/vectordbs
- tests/unit/vectordbs
3 files changed
+1043
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
637 | 637 | | |
638 | 638 | | |
639 | 639 | | |
| 640 | + | |
| 641 | + | |
0 commit comments