Skip to content

Commit 4dbd1e5

Browse files
manavgupclaude
andauthored
feat: enhance VectorStore base class with connection management and health checks (#212) (#579)
* feat: enhance VectorStore base class with connection management and health checks Addresses Issue #212 - Enhanced VectorStore abstract base class This commit adds production-ready enhancements to the VectorStore base class: New Features: - Connection Management: connect(), disconnect(), is_connected property, connection_context() - Health Check: health_check() with timeout and error handling - Collection Statistics: get_collection_stats() with implementation delegation - Common Utilities: _collection_exists(), _batch_chunks(), _validate_collection_config() Implementation Details: - Uses Pydantic models from PR #571 (CollectionConfig, VectorDBResponse, etc.) - Abstract methods for implementation-specific behavior (_health_check_impl, _get_collection_stats_impl) - Consistent error handling with VectorDBResponse - Full backward compatibility - no breaking changes to existing abstract methods Type Aliases Added to data_types.py: - HealthCheckResponse = VectorDBResponse[dict[str, Any]] - CollectionStatsResponse = VectorDBResponse[dict[str, Any]] Benefits: - Reduces code duplication across all vector store implementations - Provides consistent error handling patterns - Enables easy testing via mock implementations - Sets foundation for Issues #577 (MilvusStore) and #578 (other stores) Related: - Supersedes PR #575 (conflicted version) - Builds on: PR #571 (Pydantic models) - Enables: Issue #577, #578 (store integrations) Signed-off-by: manavgup <manavg@gmail.com> * chore: remove temporary analysis files from PR Signed-off-by: manavgup <manavg@gmail.com> * fix: address PR review feedback for vector_store.py enhancements Addresses review comments from PR #579: 1. Removed abstract decorators from _health_check_impl and _get_collection_stats_impl - Added default implementations to avoid breaking existing vector stores - _health_check_impl returns basic connection status - _get_collection_stats_impl raises NotImplementedError with clear message 2. Enhanced error handling in _collection_exists method - Only catches CollectionError for expected missing collections - Logs and re-raises unexpected errors (network, timeout, permissions) - Handles NotImplementedError when subclass hasn't overridden method - Prevents silent error swallowing 3. Made dimension validation strict - Changed from warning to ValueError when dimensions mismatch - Prevents insertion failures due to embedding dimension incompatibility - Provides detailed error message with collection name 4. Enhanced all docstrings with comprehensive details - Added full context for connection management methods - Included examples for all public utility methods - Documented return types, exceptions, and common use cases - Added warnings about default implementation limitations 5. Improved logging throughout - Added error_type to metadata for better debugging - Enhanced error messages with context (collection names, dimensions) - Used exc_info=True for exception stack traces - Added debug logging for successful operations All linting checks pass: - Ruff: All checks passed - Ruff format: Properly formatted - Pylint: 10.00/10 - MyPy: Type checking passed (vector_store.py only) - Pydocstyle: Docstrings compliant Signed-off-by: manavgup <manavg@gmail.com> * fix: address additional PR review feedback (#579) Addresses remaining critical issues from PR review: 1. Fixed Connection Context Manager (Issue #2) - Only disconnects connections that IT created - Preserves existing connections on exit - Added comprehensive docstrings with examples - Prevents unexpected disconnections 2. Improved Return Type Hints (Issue #3) - Changed get_collection_stats() to return CollectionStatsResponse - Added CollectionStatsResponse to imports - Better type checking and auto-completion 3. Enhanced Error Handling (Issue #4) - Specific exception handlers for VectorStoreError, CollectionError, TimeoutError - Uses logger.exception() for unexpected errors - Clear error categorization in metadata - Improved logging with exc_info=True 4. Added Comprehensive Unit Tests (Issue #7) - 24 test cases covering all base class functionality - Tests connection lifecycle management - Tests health checks and collection stats - Tests utility methods (_batch_chunks, _collection_exists, _validate_collection_config) - All tests passing Test Results: - 24/24 tests passing - Connection context manager tested with/without existing connections - Error handling tested for expected and unexpected errors - Utility methods tested with valid/invalid inputs All linting checks pass: - Ruff: All checks passed - Tests: 24/24 passing Signed-off-by: manavgup <manavg@gmail.com> * fix: address final PR review recommendations (#579) Addresses remaining issues from PR review comment: 1. Fixed Async Context Manager (CRITICAL - Issue #1) - Added warning that default calls sync connect/disconnect - Documented that implementations with async clients should override - Included example of proper async implementation - Prevents event loop blocking for async implementations 2. Updated Connection Logging (Issue #2) - Changed from INFO to DEBUG level for connect/disconnect - Added class name to log messages for clarity - Emphasizes these are flag-only operations (not real connections) - Reduces log noise for base implementation 3. Added Batch Size Upper Bound Warning (Issue #4) - Warns when batch_size > 10,000 (memory concerns) - Recommends 100-1000 for optimal performance - Helps prevent out-of-memory errors - Non-blocking (warning only, not error) 4. Documented Timeout Limitation (Issue #3) - Clarified that default implementation doesn't enforce timeout - Added example with signal-based timeout enforcement (Unix) - Added simple example without timeout - Subclasses can choose appropriate timeout strategy All changes are backward compatible and non-breaking. Test Results: - 24/24 tests passing - All linting checks pass (Ruff, format) Signed-off-by: manavgup <manavg@gmail.com> * fix(vectordb): Address final PR #579 review recommendations This commit addresses the remaining review items from PR #579: 1. CRITICAL: Async context manager event loop blocking - Strengthened warning with ALL CAPS emphasis - Added detailed example showing proper async override - Clarified that sync connect()/disconnect() WILL block event loop - Documented that implementations with async I/O MUST override 2. MEDIUM: Invalid attribute access in _batch_chunks warning - Fixed invalid getattr(chunks[0], "collection_name", "unknown") - Changed to use len(chunks) which is always available - EmbeddedChunk objects don't have collection_name attribute 3. MEDIUM: _collection_exists error handling - Changed to raise NotImplementedError instead of returning False - Added clear error message when _get_collection_stats_impl not implemented - Prevents silent masking of implementation gaps - Updated test to expect NotImplementedError 4. MINOR: Enhanced _validate_collection_config - Clarified Pydantic already validates empty strings and non-positive dimensions - Removed redundant validation checks - Kept whitespace-only collection name check (Pydantic doesn't catch this) - Updated docstring to explain Pydantic vs custom validation - Added test for whitespace-only collection names All 26 unit tests passing. Ruff formatting and linting checks pass. Refs: #579 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Signed-off-by: manavgup <manavg@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 1d3f460 commit 4dbd1e5

File tree

3 files changed

+1043
-5
lines changed

3 files changed

+1043
-5
lines changed

backend/vectordbs/data_types.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -637,3 +637,5 @@ def get_data_or_raise(self) -> T:
637637
VectorDBSearchResponse = VectorDBResponse[list[QueryResult]] # Search results
638638
VectorDBCollectionResponse = VectorDBResponse[dict[str, Any]] # Collection info
639639
VectorDBDeleteResponse = VectorDBResponse[bool] # Delete success status
640+
HealthCheckResponse = VectorDBResponse[dict[str, Any]] # Health check status
641+
CollectionStatsResponse = VectorDBResponse[dict[str, Any]] # Collection statistics

0 commit comments

Comments
 (0)