Skip to content

[refactor] Semantic Function Clustering Analysis - Refactoring Opportunities #331

@github-actions

Description

@github-actions

Executive Summary

Analysis of 44 non-test Go files in internal/ identified 8 high-priority refactoring opportunities across authentication, sanitization, logging, and validation domains. The most significant findings include duplicated sanitization logic (3 implementations), auth header parsing duplication (2 packages), and complexity in the logger package (10 files, 65 unexported helpers).

Estimated Total Effort: 4-6 hours of focused refactoring across 3 phases.


Files Analyzed (30 files)

Package Files Non-Test LOC Key Areas
auth/ 1 ~100 Authentication header parsing
config/ 5 ~800 Configuration validation & parsing
guard/ 4 ~300 Security context & agent ID extraction
launcher/ 1 ~400 Process management & env sanitization
logger/ 10 ~1200 Multi-format logging infrastructure
server/ 6 ~500 HTTP server & middleware
mcp/ 3 ~250 MCP protocol types

🔴 Critical Findings: Duplicate Functions

1. Sanitization Logic Duplication (HIGH PRIORITY)

Problem: 3 different implementations of "first 4 chars + ..." sanitization across packages.

Duplicate Functions:

// internal/auth/header.go:35
func sanitizeForLogging(input string) string {
    if len(input) > 4 {
        return input[:4] + "..."
    } else if len(input) > 0 {
        return "..."
    }
    return ""
}

// internal/launcher/launcher.go:22
func sanitizeEnvForLogging(env map[string]string) map[string]string {
    // ... same 4-char logic for map values
    sanitized[key] = value[:4] + "..."
}

// internal/logger/rpc_logger.go:50 (uses internal/logger/sanitize package)
func truncateAndSanitize(payload string, maxLength int) string {
    sanitized := sanitize.SanitizeString(payload)  // Uses regex patterns
    // ... then truncates
}

Similarity: 80-90% functional overlap (prefix truncation for logging safety)

Impact:

  • Code maintainability: Changes to sanitization logic need 3 updates
  • Inconsistency risk: Different truncation lengths (4 chars vs none)
  • Testing burden: 3 sets of tests for same behavior

Recommendation:

  1. Move to internal/logger/sanitize package
  2. Create unified API:
    // sanitize.TruncateSecret(s string) string          // 4 chars + "..."
    // sanitize.TruncateSecretMap(m map[string]string)   // Apply to all values
  3. Update 3 call sites: auth/header.go, launcher/launcher.go, rpc_logger.go

Effort: 1 hour (30 min refactor + 30 min test updates)


2. Auth Header Parsing Duplication (MEDIUM PRIORITY)

Problem: Auth header parsing logic exists in 2 packages with 70% code overlap.

Duplicate Functions:

// internal/auth/header.go:56
func ParseAuthHeader(authHeader string) (apiKey, agentID string, error error) {
    // Handles: Bearer, Agent, plain API key
    // Returns: apiKey + agentID + error
    // Has logging: log.Printf(...)
}

// internal/guard/context.go:42
func ExtractAgentIDFromAuthHeader(authHeader string) string {
    // Handles: Bearer, Agent, plain
    // Returns: only agentID (no error)
    // No logging
}

Code Overlap: 70% (both parse Bearer/Agent prefixes)

Differences:

  • ParseAuthHeader: Full error handling, logging, returns tuple
  • ExtractAgentIDFromAuthHeader: Silent fallback to "default", no errors

Impact:

  • Bug risk: If auth logic changes, must update both
  • Currently: server/auth.go uses neither (direct header comparison)
  • Inconsistency: Guard package should delegate to auth package

Recommendation:

  1. Make internal/auth the single source of truth
  2. Add convenience method to auth package:
    // auth.ExtractAgentID(header string) string  // Wraps ParseAuthHeader, no error
  3. Update guard/context.go to call auth.ExtractAgentID()
  4. Deprecate guard.ExtractAgentIDFromAuthHeader with comment

Effort: 45 minutes (code + tests + validation)


3. Error Message Extraction Confusion (LOW PRIORITY)

Problem: File naming suggests duplication but files serve different purposes.

Files:

  • internal/logger/error_formatting.go (47 lines) - Single function ExtractErrorMessage
  • internal/logger/global_patterns.go (83 lines) - Global state helpers (6 functions)

Analysis:

  • INCORRECT initial assumption: These are NOT duplicates
  • error_formatting.go: Log line cleanup (removes timestamps/levels)
  • global_patterns.go: Mutex-wrapped global logger init/close helpers
  • Misleading name: global_patterns.go should be renamed to global_helpers.go or global_state.go

Recommendation:

  1. Rename global_patterns.goglobal_helpers.go (better reflects purpose)
  2. No code changes needed

Effort: 5 minutes (file rename + import updates)


🟡 Outlier Functions (Wrong File Placement)

4. Validation Logic in Server Package (MEDIUM PRIORITY)

Finding: No misplaced validation found in server package (good separation).

Verified:

  • server/auth.go: Only contains middleware (correct)
  • server/routed.go, unified.go: Only routing logic
  • All validation delegated to internal/config (correct architecture)

🟠 Complexity Issues

5. Logger Package Fragmentation (MEDIUM PRIORITY)

Problem: 10 files, 65 unexported helpers, unclear responsibilities.

Current Structure:

internal/logger/
├── logger.go              (4 exports) - Debug logger with DEBUG env patterns
├── file_logger.go         (9 exports) - File logging + stdout fallback
├── jsonl_logger.go        (5 exports) - JSONL format for RPC messages
├── markdown_logger.go     (9 exports) - Markdown format with emojis
├── rpc_logger.go          (3 exports + 7 unexported) ⚠️ HIGH COMPLEXITY
├── slog_adapter.go        (8 exports) - slog.Handler adapter
├── common.go              (0 exports, 6 helpers) - Global state management
├── error_formatting.go    (1 export) - Log line cleanup
├── global_patterns.go     (0 exports, 6 helpers) - Global init/close
└── sanitize/sanitize.go   (2 exports) - Secret redaction

Issues:

  1. rpc_logger.go: Mixed concerns (formatting + logging + truncation)

    • 10 functions total (3 exported, 7 unexported helpers)
    • Handles 3 formats: text, markdown, JSONL
    • Should be split into formatting vs logging
  2. Naming confusion: global_patterns.go doesn't contain patterns

Recommendation:

  1. Split rpc_logger.go:

    rpc_logger.go       → Keep logging coordination (LogRPCRequest/Response/Message)
    rpc_formatter.go    → Move formatRPCMessage, formatRPCMessageMarkdown
    rpc_helpers.go      → Move extractEssentialFields, getMapKeys, isEffectivelyEmpty
    
  2. Rename global_patterns.goglobal_helpers.go

  3. Consider merging common.go + global_helpers.go (both manage global state)

Effort: 2 hours (file splits + import updates + test refactoring)


🔵 Scattered Helper Patterns

6. String Formatting Helpers (LOW PRIORITY)

Finding: Formatting functions are well-organized (no scattering).

Locations:

  • config/rules/: Validation error formatting (correct)
  • config/schema_validation.go: JSON schema error formatting (correct)
  • logger/error_formatting.go: Log line cleanup (correct)

No Action Needed: Current organization is appropriate.


7. Sanitization Pattern Analysis

5 Sanitization Functions (analyzed in Finding #1):

Function Location Purpose Lines
SanitizeString logger/sanitize Regex-based secret detection 25
SanitizeJSON logger/sanitize JSON payload sanitization 30
truncateAndSanitize logger/rpc_logger Combined truncation + sanitization 9
sanitizeEnvForLogging launcher/launcher Env var prefix truncation 15
sanitizeForLogging auth/header Auth header prefix truncation 8

Cluster: Last 3 functions implement same "4-char prefix" pattern (see Finding #1).


8. Auth Parsing Pattern Analysis

4 Auth Parsing Functions (analyzed in Finding #2):

Function Location Returns Error Handling
ParseAuthHeader auth/header (apiKey, agentID, error) ✅ Full
ExtractAgentIDFromAuthHeader guard/context agentID ❌ Silent fallback
ValidateAPIKey auth/header error ✅ Full
authMiddleware server/auth N/A (middleware) ✅ HTTP errors

Cluster: Functions 1 & 2 are duplicates (see Finding #2).


📊 Validation Pattern Analysis

Strong Points ✅:

  1. Centralized error types: config/rules.ValidationError used throughout
  2. Clear separation: All config validation in internal/config/
  3. Helper library: config/rules/ provides reusable validators
  4. Structured errors: JSON path + suggestion fields

Validation Functions (10+):

  • config/validation.go: validateMounts, validateServerConfig, validateGatewayConfig
  • config/schema_validation.go: validateJSONSchema, validateStringPatterns, formatSchemaError
  • config/env_validation.go: ValidateExecutionEnvironment, validateContainerID
  • config/rules/rules.go: PortRange, TimeoutPositive, MountFormat, etc.

No Issues Found: Validation is well-organized with single responsibility.


🎯 Recommended Refactoring Phases

Phase 1: Quick Wins (1.5 hours)

  1. Sanitization consolidation (Finding Configure as a Go CLI tool #1) - 1 hour
  2. File rename (global_patterns.go) - 5 minutes
  3. Auth parsing consolidation (Finding Lpcox/initial implementation #2) - 45 minutes

Phase 2: Structural Improvements (2 hours)

  1. Split rpc_logger.go (Finding Updated Dockerfile #5) - 2 hours

Phase 3: Documentation (30 minutes)

  1. Add package-level documentation to clarify:
    • internal/logger responsibilities
    • internal/sanitize usage guidelines
    • internal/auth vs internal/guard boundaries

Total Estimated Effort: 4-6 hours


📈 Metrics Summary

Metric Value
Files Analyzed 30 non-test Go files
Duplicate Functions 5 (sanitization: 3, auth: 2)
Outlier Functions 0 (good separation)
Complex Files 1 (rpc_logger.go - 10 functions)
Unexported Helpers 65 across all packages
High Priority Issues 3 (sanitization, auth, logger split)
Estimated ROI High (improves maintainability, reduces bug risk)

🔧 Implementation Notes

Testing Strategy:

  • All changes must maintain 100% test coverage
  • Use table-driven tests (existing pattern)
  • Run make test-all before completion

Breaking Changes:

  • None proposed (all changes are internal refactoring)
  • Public APIs remain unchanged

Migration Path:

  1. Create new consolidated functions
  2. Deprecate old functions with comments
  3. Update call sites incrementally
  4. Remove deprecated functions in future release

✅ Next Steps

  1. Review this analysis with team for prioritization
  2. Create sub-issues for each phase if approved
  3. Start with Phase 1 (quick wins, 1.5 hours)
  4. Run make agent-finished after each phase to verify

Analysis Date: 2025-01-18
Analyzer: GitHub Copilot CLI (alternative tooling due to Serena MCP server unavailability)
Methodology: Grep-based pattern analysis + manual code review + explore agent assistance

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions