Enhance data_types.py with vector database optimized pydantic models

## 📋 Overview

Enhance the pydantic data models in `vectordbs/data_types.py` to provide vector database optimized structures that eliminate manual parsing and improve type safety across all vector database implementations.

## 🎯 Goals

- Eliminate manual dict parsing in vector DB implementations
- Provide type-safe request/response models  
- Enable better error handling with structured responses
- Standardize serialization/deserialization patterns
- Improve developer experience with IDE completion and validation

## 🔧 Technical Specifications

### New Pydantic Models to Add

#### 1. EmbeddedChunk Model
```python
class EmbeddedChunk(DocumentChunk):
    \"\"\"Chunk guaranteed to have embeddings for vector DB storage\"\"\"
    embeddings: List[float]  # Required, not optional
    
    @classmethod
    def from_chunk(cls, chunk: DocumentChunk) -> 'EmbeddedChunk':
        \"\"\"Convert DocumentChunk to EmbeddedChunk with validation\"\"\"
        
    def to_vector_metadata(self) -> Dict[str, Any]:
        \"\"\"Serialize metadata for vector DB storage\"\"\"
        
    def to_vector_db(self) -> Dict[str, Any]:
        \"\"\"Complete serialization for vector DB insertion\"\"\"
```

#### 2. Request/Response Models
```python
class DocumentIngestionRequest(BaseModel):
    \"\"\"Request for adding documents to vector DB\"\"\"
    collection_name: str
    documents: List[Document] 
    batch_size: Optional[int] = 100
    
    def get_embedded_chunks(self) -> List[EmbeddedChunk]:
        \"\"\"Extract all chunks that have embeddings\"\"\"

class VectorSearchRequest(BaseModel):
    \"\"\"Standardized search request\"\"\"
    collection_name: str
    query: Union[str, List[float]]  # Text or embedding vector
    filters: Optional[DocumentMetadataFilter] = None
    limit: int = 10
    include_metadata: bool = True
    
class CollectionConfig(BaseModel):
    \"\"\"Vector DB collection configuration\"\"\"
    name: str
    dimension: int
    metric: VectorMetric = VectorMetric.COSINE
    cloud_provider: Optional[str] = None
    region: Optional[str] = None
    
    def validate_for_vector_db(self, db_type: str) -> None:
        \"\"\"Validate configuration for specific vector DB type\"\"\"
```

#### 3. Generic Response Models
```python
class VectorDBResponse(BaseModel, Generic[T]):
    \"\"\"Standardized response wrapper for all vector DB operations\"\"\"
    success: bool
    data: Optional[T] = None
    error: Optional[str] = None
    metadata: Optional[Dict[str, Any]] = None
    
    @classmethod
    def success(cls, data: T, metadata: Optional[Dict] = None) -> 'VectorDBResponse[T]':
        \"\"\"Create success response\"\"\"
        
    @classmethod 
    def error(cls, message: str, metadata: Optional[Dict] = None) -> 'VectorDBResponse[T]':
        \"\"\"Create error response\"\"\"

# Type aliases for common responses
DocumentIngestionResponse = VectorDBResponse[List[str]]
CollectionResponse = VectorDBResponse[str] 
SearchResponse = VectorDBResponse[List[QueryResult]]
HealthCheckResponse = VectorDBResponse[Dict[str, Any]]
```

## ✅ Acceptance Criteria

### Functional Requirements
- [ ] All new pydantic models validate correctly with proper type hints
- [ ] EmbeddedChunk enforces non-null embeddings at creation time
- [ ] Request models provide convenient methods for data extraction
- [ ] Response models support both success and error states
- [ ] Serialization methods produce vector DB compatible formats
- [ ] Validation methods catch configuration errors early

### Technical Requirements  
- [ ] All models inherit from BaseModel with proper type annotations
- [ ] Use pydantic v2 features (Field, computed fields, validators)
- [ ] Include comprehensive docstrings with examples
- [ ] Support JSON serialization/deserialization
- [ ] Maintain backward compatibility with existing Document/DocumentChunk usage
- [ ] Add proper __repr__ methods for debugging

### Testing Requirements
- [ ] Unit tests for all new models and methods
- [ ] Validation testing for edge cases and error conditions
- [ ] Serialization/deserialization round-trip tests
- [ ] Performance tests for large document batches
- [ ] Integration tests with existing codebase

## 🔄 Implementation Details

### File Changes
- `vectordbs/data_types.py` - Primary implementation
- `tests/unit/test_data_types.py` - Comprehensive test coverage  
- `tests/integration/test_vector_models.py` - Integration testing

### Dependencies
- Pydantic v2
- Python 3.12+ type hints
- No breaking changes to existing models

### Migration Strategy
- All new models are additive - existing code continues to work
- Provide utility functions to convert between old and new patterns
- Gradual adoption in downstream code

## 🧪 Testing Strategy

### Unit Tests
```python
def test_embedded_chunk_validation():
    \"\"\"Test EmbeddedChunk requires embeddings\"\"\"
    
def test_ingestion_request_chunk_filtering():
    \"\"\"Test DocumentIngestionRequest filters embedded chunks correctly\"\"\"
    
def test_response_model_serialization():
    \"\"\"Test VectorDBResponse serializes properly\"\"\"
    
def test_collection_config_validation():
    \"\"\"Test CollectionConfig validates for different vector DBs\"\"\"
```

## 📊 Success Metrics

- [ ] Zero manual dict parsing in vector DB implementations after adoption
- [ ] 100% type coverage with mypy
- [ ] <100ms serialization time for 1000 document chunks
- [ ] Backward compatibility maintained for all existing usages

## 🔗 Related Issues

- Depends on: None (foundational work)
- Blocks: Enhanced VectorStore Base Class (TBD)
- Blocks: Vector DB Implementation Refactoring (TBD)

Priority: High
Estimated Effort: Medium (3-5 days)
Risk Level: Low (additive changes only)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance data_types.py with vector database optimized pydantic models #211

📋 Overview

🎯 Goals

🔧 Technical Specifications

New Pydantic Models to Add

1. EmbeddedChunk Model

2. Request/Response Models

3. Generic Response Models

✅ Acceptance Criteria

Functional Requirements

Technical Requirements

Testing Requirements

🔄 Implementation Details

File Changes

Dependencies

Migration Strategy

🧪 Testing Strategy

Unit Tests

📊 Success Metrics

🔗 Related Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhance data_types.py with vector database optimized pydantic models #211

Description

📋 Overview

🎯 Goals

🔧 Technical Specifications

New Pydantic Models to Add

1. EmbeddedChunk Model

2. Request/Response Models

3. Generic Response Models

✅ Acceptance Criteria

Functional Requirements

Technical Requirements

Testing Requirements

🔄 Implementation Details

File Changes

Dependencies

Migration Strategy

🧪 Testing Strategy

Unit Tests

📊 Success Metrics

🔗 Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions