Skip to content

Commit 15d8036

Browse files
committed
docs: Move logging documentation to docs/ and fix all linting issues
## Documentation Changes - Created `docs/development/logging.md` with comprehensive mkdocs-formatted documentation - Architecture overview - Configuration reference - Complete usage examples - API reference - Migration guide - Troubleshooting guide - Updated `CLAUDE.md` to reference the detailed documentation instead of duplicating content - Kept brief summary with quick example - Added link to full documentation ## Linting Fixes ### Ruff (✅ All checks passing) - Removed unused imports (Path, Any, get_context) - Converted `Optional[X]` to `X | None` (UP045) - Removed `noqa` directives for non-enabled checks - Fixed timezone import (`UTC` instead of `timezone`) - Combined nested `with` statements in tests ### Mypy (✅ No errors) - Fixed ContextVar default value (B039): Changed from `LogContext()` to `None` - Added `get_context()` initialization logic to handle None case - Added `DOCUMENT_PROCESSING` constant to PipelineStage class - Added `# type: ignore[misc]` comments to pytest.mark.asyncio decorators ### Code Quality Improvements - All files now use modern Python 3.12+ type hints - Proper null handling in ContextVar - Consistent code formatting throughout ## Files Modified - `docs/development/logging.md` - NEW comprehensive documentation - `CLAUDE.md` - Simplified with reference to detailed docs - `backend/core/logging_context.py` - Fixed ContextVar, added constant - `backend/core/log_storage_service.py` - Type hint modernization - `backend/core/enhanced_logging.py` - Import cleanup, type hints - `backend/core/enhanced_logging_example.py` - Type hints, combined with statements - `backend/tests/unit/test_enhanced_logging.py` - Type ignore comments, combined with - `backend/poetry.lock` - Updated with python-json-logger dependency ## Verification ✅ Ruff: All checks passing ✅ Mypy: No type errors found 📝 Note: Pylint/pydocstyle skipped (not in current poetry environment) The enhanced logging implementation is now fully documented and lint-compliant. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 5e44c48 commit 15d8036

File tree

8 files changed

+729
-223
lines changed

8 files changed

+729
-223
lines changed

CLAUDE.md

Lines changed: 12 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -404,155 +404,28 @@ make validate-ci
404404

405405
RAG Modulo implements an enhanced logging system with structured context tracking, request correlation, and performance monitoring. Based on patterns from IBM mcp-context-forge.
406406

407-
#### Key Features
408-
409-
- **Dual Output Formats**: JSON for production/monitoring, text for development
410-
- **Context Tracking**: Automatic request correlation and entity tracking (collection, user, pipeline, document)
411-
- **Pipeline Stage Tracking**: Track operations through each RAG pipeline stage
412-
- **Performance Monitoring**: Automatic timing for all operations
413-
- **In-Memory Storage**: Queryable log buffer for debugging and admin UI
414-
415-
#### Configuration
416-
417-
```env
418-
# Logging settings (.env)
419-
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
420-
LOG_FORMAT=text # text (dev) or json (prod)
421-
LOG_TO_FILE=true
422-
LOG_FILE=rag_modulo.log
423-
LOG_FOLDER=logs
424-
LOG_ROTATION_ENABLED=true
425-
LOG_MAX_SIZE_MB=10
426-
LOG_BACKUP_COUNT=5
427-
428-
# Log storage (in-memory)
429-
LOG_STORAGE_ENABLED=true
430-
LOG_BUFFER_SIZE_MB=5
431-
```
432-
433-
#### Usage in Services
407+
**Key Features**: Dual output formats (JSON/text), context tracking, pipeline stage tracking, performance monitoring, in-memory queryable storage.
434408

409+
**Quick Example**:
435410
```python
436411
from core.enhanced_logging import get_logger
437412
from core.logging_context import log_operation, pipeline_stage_context, PipelineStage
438413

439414
logger = get_logger(__name__)
440415

441-
async def search(self, search_input: SearchInput) -> SearchOutput:
442-
# Wrap entire operation for automatic timing and context
443-
with log_operation(
444-
logger,
445-
"search_documents",
446-
entity_type="collection",
447-
entity_id=str(search_input.collection_id),
448-
user_id=str(search_input.user_id),
449-
query=search_input.question # Additional metadata
450-
):
451-
# Each pipeline stage tracked separately
452-
with pipeline_stage_context(PipelineStage.QUERY_VALIDATION):
453-
validate_search_input(search_input)
454-
455-
with pipeline_stage_context(PipelineStage.QUERY_REWRITING):
456-
rewritten = await self.rewrite_query(search_input.question)
457-
logger.info("Query rewritten", extra={
458-
"original": search_input.question,
459-
"rewritten": rewritten
460-
})
461-
462-
with pipeline_stage_context(PipelineStage.VECTOR_SEARCH):
463-
results = await self.vector_search(rewritten)
464-
logger.info("Vector search completed", extra={
465-
"result_count": len(results),
466-
"top_score": results[0].score if results else 0
467-
})
468-
```
469-
470-
#### Log Output Examples
471-
472-
**Text Format** (development):
473-
```
474-
[2025-10-22T10:30:45] INFO rag.search: Starting search_documents [req_id=req_abc123, collection=coll_456, user=user_xyz]
475-
[2025-10-22T10:30:45] INFO rag.search: Query rewritten [stage=query_rewriting] | original=What is AI?, rewritten=artificial intelligence machine learning
476-
[2025-10-22T10:30:45] INFO rag.search: Vector search completed [stage=vector_search] | result_count=5, top_score=0.95
477-
[2025-10-22T10:30:45] INFO rag.search: Completed search_documents (took 234.56ms)
478-
```
479-
480-
**JSON Format** (production):
481-
```json
482-
{
483-
"timestamp": "2025-10-22T10:30:45.123Z",
484-
"level": "info",
485-
"logger": "rag.search",
486-
"message": "Query rewritten",
487-
"context": {
488-
"request_id": "req_abc123",
489-
"user_id": "user_xyz",
490-
"collection_id": "coll_456",
491-
"operation": "search_documents",
492-
"pipeline_stage": "query_rewriting"
493-
},
494-
"original": "What is AI?",
495-
"rewritten": "artificial intelligence machine learning",
496-
"execution_time_ms": 45.2
497-
}
416+
with log_operation(logger, "search", "collection", coll_id, user_id=user_id):
417+
with pipeline_stage_context(PipelineStage.QUERY_REWRITING):
418+
logger.info("Query rewritten", extra={"original": q, "rewritten": rq})
498419
```
499420

500-
#### Pipeline Stages
501-
502-
Standard pipeline stage constants available in `PipelineStage`:
503-
504-
**Query Processing**: `QUERY_VALIDATION`, `QUERY_REWRITING`, `QUERY_EXPANSION`, `QUERY_DECOMPOSITION`
505-
**Embedding**: `EMBEDDING_GENERATION`, `EMBEDDING_BATCHING`
506-
**Retrieval**: `VECTOR_SEARCH`, `KEYWORD_SEARCH`, `HYBRID_SEARCH`, `DOCUMENT_RETRIEVAL`
507-
**Reranking**: `RERANKING`, `RELEVANCE_SCORING`
508-
**Generation**: `PROMPT_CONSTRUCTION`, `LLM_GENERATION`, `ANSWER_PROCESSING`, `SOURCE_ATTRIBUTION`
509-
**Chain of Thought**: `COT_REASONING`, `COT_QUESTION_DECOMPOSITION`, `COT_ANSWER_SYNTHESIS`
510-
**Documents**: `DOCUMENT_PARSING`, `DOCUMENT_CHUNKING`, `DOCUMENT_INDEXING`
511-
512-
#### Benefits
513-
514-
**Full Request Traceability**: Track every search request through the entire RAG pipeline
515-
**Performance Insights**: Automatic timing for each pipeline stage
516-
**Debugging 50% Faster**: Structured context makes finding issues trivial
517-
**Production Ready**: JSON output integrates with ELK, Splunk, CloudWatch
518-
**Zero Performance Impact**: Async logging with buffering
519-
**Developer Friendly**: Human-readable text format for local development
520-
**Queryable**: In-memory log storage for admin UI and debugging
521-
522-
#### Migration from Old Logging
523-
524-
The old `logging_utils.py` continues to work during migration:
421+
**📖 Full Documentation**: [docs/development/logging.md](docs/development/logging.md)
525422

526-
```python
527-
# Old style (still works)
528-
from core.logging_utils import get_logger
529-
logger = get_logger(__name__)
530-
logger.info("Something happened")
531-
532-
# New style (enhanced - recommended)
533-
from core.enhanced_logging import get_logger
534-
from core.logging_context import log_operation
535-
536-
logger = get_logger(__name__)
537-
with log_operation(logger, "operation_name", "entity_type", "entity_id"):
538-
logger.info("Something happened", extra={"key": "value"})
539-
```
540-
541-
#### Example Integration
542-
543-
See `backend/core/enhanced_logging_example.py` for comprehensive examples including:
544-
- Simple search operations
545-
- Chain of Thought reasoning
546-
- Error handling
547-
- Batch processing
548-
- API endpoint integration
549-
550-
#### Testing
551-
552-
Run logging tests:
553-
```bash
554-
pytest backend/tests/unit/test_enhanced_logging.py -v
555-
```
423+
- Configuration reference
424+
- Complete usage examples
425+
- API reference
426+
- Migration guide
427+
- Testing guide
428+
- Troubleshooting
556429

557430
### Vector Database Support
558431

backend/core/enhanced_logging.py

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,18 +21,15 @@
2121
import logging.handlers
2222
import os
2323
from asyncio import AbstractEventLoop, get_running_loop
24-
from pathlib import Path
25-
from typing import Any, Optional
2624

2725
from pythonjsonlogger import jsonlogger
2826

2927
from core.log_storage_service import LogLevel, LogStorageService
30-
from core.logging_context import get_context
3128

3229
# Global handlers will be created lazily
33-
_file_handler: Optional[logging.Handler] = None
34-
_text_handler: Optional[logging.StreamHandler] = None
35-
_storage_handler: Optional[logging.Handler] = None
30+
_file_handler: logging.Handler | None = None
31+
_text_handler: logging.StreamHandler | None = None
32+
_storage_handler: logging.Handler | None = None
3633

3734
# Text formatter
3835
_text_formatter = logging.Formatter(
@@ -48,7 +45,7 @@
4845

4946
def _get_file_handler(
5047
log_file: str = "rag_modulo.log",
51-
log_folder: Optional[str] = "logs",
48+
log_folder: str | None = "logs",
5249
log_rotation_enabled: bool = True,
5350
log_max_size_mb: int = 10,
5451
log_backup_count: int = 5,
@@ -72,7 +69,7 @@ def _get_file_handler(
7269
Raises:
7370
ValueError: If file logging is disabled or no log file specified
7471
"""
75-
global _file_handler # noqa: PLW0603
72+
global _file_handler
7673
if _file_handler is None:
7774
if not log_file:
7875
raise ValueError("No log file specified")
@@ -108,7 +105,7 @@ def _get_text_handler() -> logging.StreamHandler:
108105
Returns:
109106
logging.StreamHandler: The stream handler for console logging
110107
"""
111-
global _text_handler # noqa: PLW0603
108+
global _text_handler
112109
if _text_handler is None:
113110
_text_handler = logging.StreamHandler()
114111
_text_handler.setFormatter(_text_formatter)
@@ -266,7 +263,7 @@ async def initialize(
266263
log_format: str = "text",
267264
log_to_file: bool = True,
268265
log_file: str = "rag_modulo.log",
269-
log_folder: Optional[str] = "logs",
266+
log_folder: str | None = "logs",
270267
log_rotation_enabled: bool = True,
271268
log_max_size_mb: int = 10,
272269
log_backup_count: int = 5,
@@ -339,7 +336,7 @@ async def initialize(
339336
self._storage = LogStorageService(max_size_mb=log_buffer_size_mb)
340337

341338
# Add storage handler to capture all logs
342-
global _storage_handler # noqa: PLW0603
339+
global _storage_handler
343340
_storage_handler = StorageHandler(self._storage)
344341
_storage_handler.setFormatter(_text_formatter)
345342
_storage_handler.setLevel(log_level_value)
@@ -398,7 +395,7 @@ def get_logger(self, name: str) -> logging.Logger:
398395

399396
return self._loggers[name]
400397

401-
def get_storage(self) -> Optional[LogStorageService]:
398+
def get_storage(self) -> LogStorageService | None:
402399
"""Get the log storage service if available.
403400
404401
Returns:
@@ -408,7 +405,7 @@ def get_storage(self) -> Optional[LogStorageService]:
408405

409406

410407
# Global logging service instance
411-
_logging_service: Optional[LoggingService] = None
408+
_logging_service: LoggingService | None = None
412409

413410

414411
def get_logging_service() -> LoggingService:
@@ -417,7 +414,7 @@ def get_logging_service() -> LoggingService:
417414
Returns:
418415
LoggingService instance
419416
"""
420-
global _logging_service # noqa: PLW0603
417+
global _logging_service
421418
if _logging_service is None:
422419
_logging_service = LoggingService()
423420
return _logging_service
@@ -447,7 +444,7 @@ async def initialize_logging(
447444
log_format: str = "text",
448445
log_to_file: bool = True,
449446
log_file: str = "rag_modulo.log",
450-
log_folder: Optional[str] = "logs",
447+
log_folder: str | None = "logs",
451448
log_rotation_enabled: bool = True,
452449
log_max_size_mb: int = 10,
453450
log_backup_count: int = 5,

backend/core/enhanced_logging_example.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
"""
1212

1313
import asyncio
14-
from typing import Optional
1514

1615
from pydantic import UUID4
1716

@@ -207,7 +206,7 @@ async def _synthesize_answers(answers: list[dict]) -> str:
207206
return "Synthesized final answer based on sub-answers"
208207

209208

210-
async def example_error_handling(collection_id: UUID4, user_id: UUID4) -> Optional[dict]:
209+
async def example_error_handling(collection_id: UUID4, user_id: UUID4) -> dict | None:
211210
"""Example error handling with enhanced logging.
212211
213212
Demonstrates how errors are automatically logged with context.
@@ -226,10 +225,9 @@ async def example_error_handling(collection_id: UUID4, user_id: UUID4) -> Option
226225
entity_type="collection",
227226
entity_id=str(collection_id),
228227
user_id=str(user_id),
229-
):
230-
with pipeline_stage_context(PipelineStage.DOCUMENT_PROCESSING):
231-
# Simulate an error
232-
raise ValueError("Simulated processing error")
228+
), pipeline_stage_context(PipelineStage.DOCUMENT_PROCESSING):
229+
# Simulate an error
230+
raise ValueError("Simulated processing error")
233231
except ValueError as e:
234232
# Error is automatically logged by log_operation context manager
235233
# with full context, timing, and stack trace

0 commit comments

Comments
 (0)