-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Implement Multi-Layer Defense System for Catching Integration Bugs Early
Problem Statement
What Happened
Three critical bugs reached main branch today (Nov 7, 2025) that broke the application:
- Import Error -
settings_routerimported but file never created (PR feat: Dynamic Configuration System with Hierarchical Precedence #555) - Abstract Method -
MilvusStoremissing_delete_collection_impl()implementation - Type Mismatch - Repository methods returning wrong types (Pydantic schemas instead of DB models)
All three caused 500 errors and made the application unusable until hotfixes were deployed.
Pattern Recognition
All three share a common root cause: Incomplete refactoring caught only at runtime
- Imports added, but modules never created
- Base classes modified, but subclasses not updated
- Return types changed, but callers not updated
- Type system didn't catch mismatches
- No integration tests validated the changes
Root Causes (Systemic Issues)
1. Missing Static Analysis
- Import hygiene: No validation that imported modules exist
- Type strictness: MyPy not strict enough to catch schema/model confusion
- Abstract validation: No check that all abstract methods are implemented
- Unused imports: Dead code not detected
2. Test Pyramid Imbalance
Current: Ideal:
E2E (5%) E2E (10%)
Integration (15%) Integration (25%)
Unit (80%) Unit (60%)
Smoke (5%)
3. No Runtime Validation
- App doesn't validate startup health
- No "fail fast" checks before accepting traffic
- Errors only discovered when endpoint is hit
4. Layer Boundary Gaps
No tests validating contracts between:
- Router ↔ Service
- Service ↔ Repository
- Repository ↔ Database Models
- Service ↔ External Providers
5. Manual Testing Dependency
Relies on developer remembering to:
- Start app locally
- Test affected endpoints
- Check for runtime errors
Proposed Solution: Layered Defense
Philosophy: Fail fast, fail cheap. Catch bugs at the earliest/fastest layer possible.
Layer 1: Pre-Commit Hooks (< 10 seconds)
Goal: Catch obvious errors before commit
# New checks to add:
1. Import validation (validate all imports resolve)
2. Abstract method checker (all abstract methods implemented)
3. Unused import detection
4. Type annotation coverage checkImplementation:
# .pre-commit-config.yaml
- id: validate-imports
name: Validate Python imports
entry: python scripts/validate_imports.py
language: python
pass_filenames: false
- id: check-abstract-methods
name: Check abstract method implementations
entry: python scripts/check_abstract.py
language: python
pass_filenames: falseCost: +5 seconds to pre-commit
Layer 2: Smoke Tests in CI (< 1 minute)
Goal: Validate app actually works
# tests/smoke/test_critical_paths.py
@pytest.mark.smoke
def test_app_starts():
"""Validates all imports and basic initialization."""
# This catches: import errors, missing modules, init failures
from main import app
assert app is not None
@pytest.mark.smoke
def test_health_endpoint():
"""Validates basic API functionality."""
client = TestClient(app)
response = client.get("/api/health")
assert response.status_code == 200
@pytest.mark.smoke
def test_collection_service_initializes():
"""Validates vector store providers load."""
from rag_solution.services.collection_service import CollectionService
# This catches: abstract method issues, provider problems
service = CollectionService(db, settings)
assert service.vector_store is not None
@pytest.mark.smoke
def test_repository_contracts():
"""Validates repository returns correct types."""
from rag_solution.repository.conversation_repository import ConversationRepository
from rag_solution.models.conversation import ConversationMessage
repo = ConversationRepository(db)
message = repo.create_message(test_input)
assert isinstance(message, ConversationMessage)
assert not isinstance(message, BaseModel) # Not a Pydantic modelCI Integration:
# .github/workflows/01-smoke-tests.yml (NEW)
name: 🔥 Smoke Tests
on: [pull_request]
jobs:
smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run smoke tests
run: |
poetry install --with test
pytest -m smoke --maxfail=1 -v
timeout-minutes: 2Cost: +45 seconds to CI (runs in parallel with other checks)
Layer 3: Contract Tests (< 30 seconds)
Goal: Validate architectural boundaries
# tests/contracts/test_repository_layer.py
class TestRepositoryContracts:
"""Validates repository layer returns database models, not schemas."""
@pytest.mark.contract
def test_all_create_methods_return_db_models(self):
"""All create_* methods must return SQLAlchemy models."""
for method in get_all_create_methods(ConversationRepository):
result = method(test_input)
assert is_sqlalchemy_model(result), f"{method.__name__} returned non-DB model"
@pytest.mark.contract
def test_all_get_methods_return_db_models(self):
"""All get_* methods must return SQLAlchemy models."""
for method in get_all_get_methods(ConversationRepository):
result = method(test_id)
assert is_sqlalchemy_model(result), f"{method.__name__} returned non-DB model"
# tests/contracts/test_service_layer.py
class TestServiceContracts:
"""Validates service layer returns Pydantic schemas, not DB models."""
@pytest.mark.contract
def test_all_service_methods_return_schemas(self):
"""All public service methods must return Pydantic models."""
for method in get_public_methods(ConversationService):
result = method(test_input)
assert isinstance(result, BaseModel), f"{method.__name__} returned non-schema"Cost: +30 seconds to CI (runs with unit tests)
Layer 4: Stricter Static Analysis (< 30 seconds)
Goal: Catch type mismatches before runtime
# pyproject.toml - Stricter MyPy
[tool.mypy]
strict = true
warn_return_any = true
warn_unused_ignores = true
disallow_untyped_defs = true
disallow_any_unimported = true
no_implicit_optional = true
check_untyped_defs = true
# New plugins
plugins = ["sqlalchemy.ext.mypy.plugin", "pydantic.mypy"]Cost: +15 seconds to lint step
Layer 5: Minimal Integration Suite (< 2 minutes)
Goal: Validate critical user flows end-to-end
# tests/integration/test_critical_flows.py
@pytest.mark.integration
@pytest.mark.critical
def test_chat_message_flow():
"""Full chat flow: create session -> send message -> get response."""
# This catches: all integration issues between layers
session = create_test_session()
response = send_chat_message(session.id, "test message")
assert response.status_code == 200
assert response.content is not None
@pytest.mark.integration
@pytest.mark.critical
def test_collection_create_and_search():
"""Full RAG flow: create collection -> upload doc -> search."""
collection = create_test_collection()
upload_document(collection.id, test_doc)
results = search(collection.id, "test query")
assert len(results) > 0CI Integration:
# Run only on merge to main (not on every PR)
# .github/workflows/05-critical-integration.yml
name: 🎯 Critical Path Integration
on:
push:
branches: [main]Cost: +2 minutes (only on merge to main, not on PR)
Layer 6: Import Validator Script (< 1 second)
Goal: Detect missing modules instantly
# scripts/validate_imports.py
"""Validates all Python imports without executing code."""
import ast
import importlib.util
from pathlib import Path
def validate_imports(file_path: Path) -> list[str]:
"""Check if all imports in file are valid."""
errors = []
with open(file_path) as f:
tree = ast.parse(f.read())
for node in ast.walk(tree):
if isinstance(node, (ast.Import, ast.ImportFrom)):
module = node.module if isinstance(node, ast.ImportFrom) else node.names[0].name
# Check if module exists
spec = importlib.util.find_spec(module.split('.')[0])
if spec is None:
errors.append(f"{file_path}:{node.lineno}: Import '{module}' not found")
return errors
# Run on all Python files
if __name__ == "__main__":
errors = []
for file in Path("backend").rglob("*.py"):
errors.extend(validate_imports(file))
if errors:
print("\n".join(errors))
sys.exit(1)Cost: +1 second to pre-commit
Layer 7: Abstract Method Validator (< 1 second)
Goal: Detect missing abstract method implementations
# scripts/check_abstract.py
"""Validates all abstract methods are implemented."""
import ast
import inspect
from pathlib import Path
from abc import ABC
def find_missing_abstract_methods() -> list[str]:
"""Find classes with unimplemented abstract methods."""
errors = []
# Import all modules
for file in Path("backend/vectordbs").rglob("*.py"):
module = import_module_from_path(file)
# Check each class
for name, obj in inspect.getmembers(module, inspect.isclass):
if issubclass(obj, ABC):
# Check if instantiable
try:
# Don't actually instantiate, just check
obj.__abstractmethods__
if obj.__abstractmethods__:
errors.append(
f"{file}:{name} missing abstract methods: "
f"{', '.join(obj.__abstractmethods__)}"
)
except TypeError:
pass
return errorsCost: +1 second to pre-commit
Implementation Plan
Phase 1: Quick Wins (< 1 day)
- Add smoke tests (5 tests)
- Add to CI as new workflow
- Update pre-commit config
Deliverable: Catch import and startup errors before merge
Phase 2: Static Analysis (< 2 days)
- Create import validator script
- Create abstract method checker
- Add to pre-commit hooks
- Increase MyPy strictness
Deliverable: Catch type and import errors at commit time
Phase 3: Contract Tests (< 3 days)
- Add repository contract tests
- Add service contract tests
- Add router contract tests
- Run with unit tests in CI
Deliverable: Validate layer boundaries automatically
Phase 4: Critical Integration (< 5 days)
- Add 5 critical path integration tests
- Run only on merge to main
- Set up result notifications
Deliverable: Validate end-to-end flows work
Success Metrics
Quantitative:
- Bug escape rate: 0 critical bugs reach main (target: maintain for 3 months)
- CI time: < 3 minutes for PR checks (smoke tests don't slow us down)
- Pre-commit time: < 10 seconds (stay fast for developers)
- Test coverage: 70% → 80% (focused on integration points)
Qualitative:
- Developers catch issues locally before pushing
- Failed PRs caught by CI, not manual testing
- Confidence in refactoring increases
Non-Goals
What we're NOT doing:
- ❌ Slow down CI (keep PR feedback under 3 minutes)
- ❌ Add heavy E2E tests (reserve for staging/production)
- ❌ Require manual testing steps (automate everything)
- ❌ Over-engineer (keep it simple, focus on high-value checks)
Open Questions
- Coverage threshold: Should we enforce coverage % in CI?
- Smoke test frequency: Every PR or only on merge?
- Contract test strictness: Fail CI or just warn?
- Monitoring: Add runtime validation in production?
References
Related Issues: #558, #559, #560, #561 (conversation service PRs)
Related PRs: #585, #586, #587 (today's hotfixes)
Cost-Benefit Analysis
Current Cost: 3 hours fixing bugs + deployment time + user impact
Prevention Cost: 10 days implementation + 10 seconds per commit + 1 minute per PR
ROI: Breaks even after preventing 3-4 similar incidents
Risk: If we don't do this, we'll keep having these issues, eroding confidence in the codebase.