Skip to content

Implement Multi-Layer Defense System for Catching Integration Bugs Early #588

@manavgup

Description

@manavgup

Implement Multi-Layer Defense System for Catching Integration Bugs Early

Problem Statement

What Happened

Three critical bugs reached main branch today (Nov 7, 2025) that broke the application:

  1. Import Error - settings_router imported but file never created (PR feat: Dynamic Configuration System with Hierarchical Precedence #555)
  2. Abstract Method - MilvusStore missing _delete_collection_impl() implementation
  3. Type Mismatch - Repository methods returning wrong types (Pydantic schemas instead of DB models)

All three caused 500 errors and made the application unusable until hotfixes were deployed.

Pattern Recognition

All three share a common root cause: Incomplete refactoring caught only at runtime

  • Imports added, but modules never created
  • Base classes modified, but subclasses not updated
  • Return types changed, but callers not updated
  • Type system didn't catch mismatches
  • No integration tests validated the changes

Root Causes (Systemic Issues)

1. Missing Static Analysis

  • Import hygiene: No validation that imported modules exist
  • Type strictness: MyPy not strict enough to catch schema/model confusion
  • Abstract validation: No check that all abstract methods are implemented
  • Unused imports: Dead code not detected

2. Test Pyramid Imbalance

Current:                   Ideal:
  E2E (5%)                   E2E (10%)
  Integration (15%)          Integration (25%)
  Unit (80%)                 Unit (60%)
                             Smoke (5%)

3. No Runtime Validation

  • App doesn't validate startup health
  • No "fail fast" checks before accepting traffic
  • Errors only discovered when endpoint is hit

4. Layer Boundary Gaps

No tests validating contracts between:

  • Router ↔ Service
  • Service ↔ Repository
  • Repository ↔ Database Models
  • Service ↔ External Providers

5. Manual Testing Dependency

Relies on developer remembering to:

  • Start app locally
  • Test affected endpoints
  • Check for runtime errors

Proposed Solution: Layered Defense

Philosophy: Fail fast, fail cheap. Catch bugs at the earliest/fastest layer possible.

Layer 1: Pre-Commit Hooks (< 10 seconds)

Goal: Catch obvious errors before commit

# New checks to add:
1. Import validation (validate all imports resolve)
2. Abstract method checker (all abstract methods implemented)
3. Unused import detection
4. Type annotation coverage check

Implementation:

# .pre-commit-config.yaml
- id: validate-imports
  name: Validate Python imports
  entry: python scripts/validate_imports.py
  language: python
  pass_filenames: false
  
- id: check-abstract-methods
  name: Check abstract method implementations
  entry: python scripts/check_abstract.py
  language: python
  pass_filenames: false

Cost: +5 seconds to pre-commit

Layer 2: Smoke Tests in CI (< 1 minute)

Goal: Validate app actually works

# tests/smoke/test_critical_paths.py

@pytest.mark.smoke
def test_app_starts():
    """Validates all imports and basic initialization."""
    # This catches: import errors, missing modules, init failures
    from main import app
    assert app is not None

@pytest.mark.smoke
def test_health_endpoint():
    """Validates basic API functionality."""
    client = TestClient(app)
    response = client.get("/api/health")
    assert response.status_code == 200

@pytest.mark.smoke
def test_collection_service_initializes():
    """Validates vector store providers load."""
    from rag_solution.services.collection_service import CollectionService
    # This catches: abstract method issues, provider problems
    service = CollectionService(db, settings)
    assert service.vector_store is not None

@pytest.mark.smoke
def test_repository_contracts():
    """Validates repository returns correct types."""
    from rag_solution.repository.conversation_repository import ConversationRepository
    from rag_solution.models.conversation import ConversationMessage
    
    repo = ConversationRepository(db)
    message = repo.create_message(test_input)
    assert isinstance(message, ConversationMessage)
    assert not isinstance(message, BaseModel)  # Not a Pydantic model

CI Integration:

# .github/workflows/01-smoke-tests.yml (NEW)
name: 🔥 Smoke Tests
on: [pull_request]
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run smoke tests
        run: |
          poetry install --with test
          pytest -m smoke --maxfail=1 -v
        timeout-minutes: 2

Cost: +45 seconds to CI (runs in parallel with other checks)

Layer 3: Contract Tests (< 30 seconds)

Goal: Validate architectural boundaries

# tests/contracts/test_repository_layer.py

class TestRepositoryContracts:
    """Validates repository layer returns database models, not schemas."""
    
    @pytest.mark.contract
    def test_all_create_methods_return_db_models(self):
        """All create_* methods must return SQLAlchemy models."""
        for method in get_all_create_methods(ConversationRepository):
            result = method(test_input)
            assert is_sqlalchemy_model(result), f"{method.__name__} returned non-DB model"
    
    @pytest.mark.contract  
    def test_all_get_methods_return_db_models(self):
        """All get_* methods must return SQLAlchemy models."""
        for method in get_all_get_methods(ConversationRepository):
            result = method(test_id)
            assert is_sqlalchemy_model(result), f"{method.__name__} returned non-DB model"

# tests/contracts/test_service_layer.py

class TestServiceContracts:
    """Validates service layer returns Pydantic schemas, not DB models."""
    
    @pytest.mark.contract
    def test_all_service_methods_return_schemas(self):
        """All public service methods must return Pydantic models."""
        for method in get_public_methods(ConversationService):
            result = method(test_input)
            assert isinstance(result, BaseModel), f"{method.__name__} returned non-schema"

Cost: +30 seconds to CI (runs with unit tests)

Layer 4: Stricter Static Analysis (< 30 seconds)

Goal: Catch type mismatches before runtime

# pyproject.toml - Stricter MyPy

[tool.mypy]
strict = true
warn_return_any = true
warn_unused_ignores = true
disallow_untyped_defs = true
disallow_any_unimported = true
no_implicit_optional = true
check_untyped_defs = true

# New plugins
plugins = ["sqlalchemy.ext.mypy.plugin", "pydantic.mypy"]

Cost: +15 seconds to lint step

Layer 5: Minimal Integration Suite (< 2 minutes)

Goal: Validate critical user flows end-to-end

# tests/integration/test_critical_flows.py

@pytest.mark.integration
@pytest.mark.critical
def test_chat_message_flow():
    """Full chat flow: create session -> send message -> get response."""
    # This catches: all integration issues between layers
    session = create_test_session()
    response = send_chat_message(session.id, "test message")
    assert response.status_code == 200
    assert response.content is not None

@pytest.mark.integration
@pytest.mark.critical  
def test_collection_create_and_search():
    """Full RAG flow: create collection -> upload doc -> search."""
    collection = create_test_collection()
    upload_document(collection.id, test_doc)
    results = search(collection.id, "test query")
    assert len(results) > 0

CI Integration:

# Run only on merge to main (not on every PR)
# .github/workflows/05-critical-integration.yml
name: 🎯 Critical Path Integration
on:
  push:
    branches: [main]

Cost: +2 minutes (only on merge to main, not on PR)

Layer 6: Import Validator Script (< 1 second)

Goal: Detect missing modules instantly

# scripts/validate_imports.py
"""Validates all Python imports without executing code."""

import ast
import importlib.util
from pathlib import Path

def validate_imports(file_path: Path) -> list[str]:
    """Check if all imports in file are valid."""
    errors = []
    
    with open(file_path) as f:
        tree = ast.parse(f.read())
    
    for node in ast.walk(tree):
        if isinstance(node, (ast.Import, ast.ImportFrom)):
            module = node.module if isinstance(node, ast.ImportFrom) else node.names[0].name
            
            # Check if module exists
            spec = importlib.util.find_spec(module.split('.')[0])
            if spec is None:
                errors.append(f"{file_path}:{node.lineno}: Import '{module}' not found")
    
    return errors

# Run on all Python files
if __name__ == "__main__":
    errors = []
    for file in Path("backend").rglob("*.py"):
        errors.extend(validate_imports(file))
    
    if errors:
        print("\n".join(errors))
        sys.exit(1)

Cost: +1 second to pre-commit

Layer 7: Abstract Method Validator (< 1 second)

Goal: Detect missing abstract method implementations

# scripts/check_abstract.py
"""Validates all abstract methods are implemented."""

import ast
import inspect
from pathlib import Path
from abc import ABC

def find_missing_abstract_methods() -> list[str]:
    """Find classes with unimplemented abstract methods."""
    errors = []
    
    # Import all modules
    for file in Path("backend/vectordbs").rglob("*.py"):
        module = import_module_from_path(file)
        
        # Check each class
        for name, obj in inspect.getmembers(module, inspect.isclass):
            if issubclass(obj, ABC):
                # Check if instantiable
                try:
                    # Don't actually instantiate, just check
                    obj.__abstractmethods__
                    if obj.__abstractmethods__:
                        errors.append(
                            f"{file}:{name} missing abstract methods: "
                            f"{', '.join(obj.__abstractmethods__)}"
                        )
                except TypeError:
                    pass
    
    return errors

Cost: +1 second to pre-commit

Implementation Plan

Phase 1: Quick Wins (< 1 day)

  • Add smoke tests (5 tests)
  • Add to CI as new workflow
  • Update pre-commit config

Deliverable: Catch import and startup errors before merge

Phase 2: Static Analysis (< 2 days)

  • Create import validator script
  • Create abstract method checker
  • Add to pre-commit hooks
  • Increase MyPy strictness

Deliverable: Catch type and import errors at commit time

Phase 3: Contract Tests (< 3 days)

  • Add repository contract tests
  • Add service contract tests
  • Add router contract tests
  • Run with unit tests in CI

Deliverable: Validate layer boundaries automatically

Phase 4: Critical Integration (< 5 days)

  • Add 5 critical path integration tests
  • Run only on merge to main
  • Set up result notifications

Deliverable: Validate end-to-end flows work

Success Metrics

Quantitative:

  • Bug escape rate: 0 critical bugs reach main (target: maintain for 3 months)
  • CI time: < 3 minutes for PR checks (smoke tests don't slow us down)
  • Pre-commit time: < 10 seconds (stay fast for developers)
  • Test coverage: 70% → 80% (focused on integration points)

Qualitative:

  • Developers catch issues locally before pushing
  • Failed PRs caught by CI, not manual testing
  • Confidence in refactoring increases

Non-Goals

What we're NOT doing:

  • ❌ Slow down CI (keep PR feedback under 3 minutes)
  • ❌ Add heavy E2E tests (reserve for staging/production)
  • ❌ Require manual testing steps (automate everything)
  • ❌ Over-engineer (keep it simple, focus on high-value checks)

Open Questions

  1. Coverage threshold: Should we enforce coverage % in CI?
  2. Smoke test frequency: Every PR or only on merge?
  3. Contract test strictness: Fail CI or just warn?
  4. Monitoring: Add runtime validation in production?

References

Related Issues: #558, #559, #560, #561 (conversation service PRs)
Related PRs: #585, #586, #587 (today's hotfixes)


Cost-Benefit Analysis

Current Cost: 3 hours fixing bugs + deployment time + user impact
Prevention Cost: 10 days implementation + 10 seconds per commit + 1 minute per PR
ROI: Breaks even after preventing 3-4 similar incidents

Risk: If we don't do this, we'll keep having these issues, eroding confidence in the codebase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttestingTesting and test infrastructure

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions