Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization and Refactoring Opportunities #256

@github-actions

Description

@github-actions

Executive Summary

Comprehensive semantic analysis of 43 Go source files (excluding tests) across the MCP Gateway codebase identified 3 high-impact refactoring opportunities focused primarily in the internal/logger package. The analysis revealed deprecated wrapper functions, duplicated patterns across multiple logger implementations, and opportunities to consolidate server creation logic.

Key Findings:

  • 5 sanitization functions with 2 deprecated wrappers that should be removed
  • Repeated Init/Close patterns across 3 logger types (FileLogger, JSONLLogger, MarkdownLogger)
  • Duplicate public APIs for file and markdown logging functions
  • Similar server creation functions in routed and transport modes
  • Well-organized packages: config, auth, difc, and guard packages show excellent organization ✓

Full Analysis Report

Function Inventory

Analyzed Packages

Package Files Primary Purpose Organization Status
internal/auth 1 Authentication header parsing ✓ Excellent
internal/cmd 2 CLI commands (Cobra) ✓ Good
internal/config 4 Configuration parsing/validation ✓ Excellent
internal/config/rules 1 Validation error rules ✓ Excellent
internal/difc 5 DIFC security labels ✓ Good
internal/guard 4 Security guard interface ✓ Good
internal/launcher 1 Backend process management ✓ Good
internal/logger 9 Multiple logging subsystems ⚠️ Needs attention
internal/mcp 2 MCP protocol types/connections ✓ Good
internal/server 6 HTTP server implementations ⚠️ Some overlap
internal/sys 1 System utilities ✓ Good
internal/testutil 4 Test utilities ✓ Good
internal/timeutil 1 Time formatting ✓ Good
internal/tty 2 TTY detection ✓ Good

Total: 43 files, ~300+ functions analyzed


Identified Issues

1. Deprecated Sanitization Wrapper Functions

Issue: Multiple wrapper functions exist for the same sanitization logic, creating maintenance overhead and potential inconsistency.

Current State:

// PRIMARY implementations (internal/logger/sanitize/sanitize.go)
func SanitizeString(message string) string
func SanitizeJSON(payloadBytes []byte) json.RawMessage

// DEPRECATED wrapper #1 (internal/logger/jsonl_logger.go:74-80)
func sanitizePayload(payloadBytes []byte) json.RawMessage {
    return sanitize.SanitizeJSON(payloadBytes)
}

// DEPRECATED wrapper #2 (internal/logger/markdown_logger.go:93-98)
func sanitizeSecrets(message string) string {
    return sanitize.SanitizeString(message)
}

// Helper that uses primary (internal/logger/rpc_logger.go:49-59)
func truncateAndSanitize(payload string, maxLength int) string {
    sanitized := sanitize.SanitizeString(payload)
    // ... truncation logic
}

Occurrences:

  1. internal/logger/sanitize/sanitize.go - Primary implementations (correct location)
  2. internal/logger/jsonl_logger.go:74 - sanitizePayload() wrapper marked deprecated
  3. internal/logger/markdown_logger.go:93 - sanitizeSecrets() wrapper marked deprecated
  4. internal/logger/rpc_logger.go:49 - truncateAndSanitize() helper (legitimate - adds truncation)

Recommendation:

  • ✅ Remove sanitizePayload() from jsonl_logger.go - replace all calls with sanitize.SanitizeJSON()
  • ✅ Remove sanitizeSecrets() from markdown_logger.go - replace all calls with sanitize.SanitizeString()
  • ✅ Keep truncateAndSanitize() - it adds truncation logic, not just a wrapper
  • ✅ Update callers to import and use internal/logger/sanitize directly

Estimated Impact:

  • Reduced code duplication
  • Single source of truth for sanitization
  • Easier to maintain secret patterns
  • Effort: 1-2 hours

2. Repeated Init/Close Pattern Across Logger Types

Issue: Three logger types (FileLogger, JSONLLogger, MarkdownLogger) follow identical initialization and closure patterns, with duplicate global accessor functions.

Pattern Observed:

// Pattern repeated 3x for File, JSONL, Markdown loggers:

// Init function (in each *_logger.go file)
func InitXXXLogger(logDir, fileName string) error {
    globalXXXMu.Lock()
    defer globalXXXMu.Unlock()
    
    if globalXXXLogger != nil {
        globalXXXLogger.Close()
    }
    
    // ... initialization
    globalXXXLogger = xl
    return nil
}

// Close function (in each *_logger.go file)
func CloseXXXLogger() error {
    globalXXXMu.RLock()
    defer globalXXXMu.RUnlock()
    
    if globalXXXLogger != nil {
        return globalXXXLogger.Close()
    }
    return nil
}

// Each logger type has Close() method that calls common helper:
func (xl *XXXLogger) Close() error {
    xl.mu.Lock()
    defer xl.mu.Unlock()
    return closeLogFile(xl.logFile, &xl.mu, "XXX")
}

Occurrences:

  • internal/logger/file_logger.go - InitFileLogger(), CloseGlobalLogger()
  • internal/logger/jsonl_logger.go - InitJSONLLogger(), CloseJSONLLogger()
  • internal/logger/markdown_logger.go - InitMarkdownLogger(), CloseMarkdownLogger()
  • internal/logger/common.go - Shared closeLogFile() helper (good!)

Analysis:
This is acceptable duplication given Go's lack of generics for this pattern prior to Go 1.18, but could be improved with:

  • Interface-based abstraction for logger lifecycle
  • Factory pattern for logger creation
  • Generic initialization logic (now possible with Go 1.18+)

Recommendation:

  • Priority: LOW (current implementation works, not causing bugs)
  • Consider interface abstraction in future refactoring:
    type Logger interface {
        Close() error
        Log(level LogLevel, category, format string, args ...interface{})
    }
  • Estimated Impact: Improved maintainability, reduced boilerplate
  • Effort: 4-6 hours

3. Duplicate Public Logging APIs

Issue: FileLogger and MarkdownLogger expose nearly identical public APIs with different suffixes, leading to confusion about which to use.

Current APIs:

// FileLogger (internal/logger/file_logger.go)
func LogInfo(category, format string, args ...interface{})
func LogWarn(category, format string, args ...interface{})
func LogError(category, format string, args ...interface{})
func LogDebug(category, format string, args ...interface{})

// MarkdownLogger (internal/logger/markdown_logger.go)
func LogInfoMd(category, format string, args ...interface{})
func LogWarnMd(category, format string, args ...interface{})
func LogErrorMd(category, format string, args ...interface{})
func LogDebugMd(category, format string, args ...interface{})

Analysis:
Both expose 4 global convenience functions with identical signatures. The Md suffix is the only differentiator. This creates:

  • API confusion - which logger should I use?
  • Import ambiguity - need to be careful with imports
  • Maintenance overhead - changes must be mirrored across both

Recommendation:

  • Priority: MEDIUM
  • Consider consolidating to a single logger interface with configurable output format
  • Alternative: Use explicit logger instances instead of global functions
  • Document when to use each logger type
  • Estimated Impact: Clearer API, reduced confusion
  • Effort: 2-3 hours (documentation) or 6-8 hours (refactoring)

4. Similar Server Creation Functions

Issue: CreateHTTPServerForRoutedMode() and CreateHTTPServerForMCP() share significant structural similarity in setup code.

Shared Patterns:

// Both functions (routed.go:26 and transport.go:78):

func CreateHTTPServerForXXX(addr string, unifiedServer *UnifiedServer, apiKey string) *http.Server {
    mux := http.NewServeMux()
    
    // OAuth discovery endpoint (identical in both)
    oauthHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        log.Printf("[%s] %s %s - OAuth discovery (not supported)", r.RemoteAddr, r.Method, r.URL.Path)
        http.NotFound(w, r)
    })
    mux.Handle("/mcp/.well-known/oauth-authorization-server", withResponseLogging(oauthHandler))
    
    // ... route-specific logic differs here ...
    
    // Health endpoint handling (similar patterns)
    // Auth middleware setup (similar patterns)
    
    return &http.Server{
        Addr:    addr,
        Handler: mux,
        // ... timeouts ...
    }
}

Differences:

  • Routed mode: Creates /mcp/{serverID} routes for each backend
  • Unified mode: Creates single /mcp endpoint with unified server

Recommendation:

  • Priority: MEDIUM
  • Extract common setup logic into shared helper functions:
    • setupOAuthDiscoveryHandler(mux) - OAuth endpoint setup
    • createBaseHTTPServer(addr, handler) - server creation with timeouts
    • setupHealthEndpoint(mux, unifiedServer) - health check setup
  • Keep mode-specific routing logic in respective files
  • Estimated Impact: Reduced duplication, easier to maintain common server config
  • Effort: 3-4 hours

Well-Organized Code (No Issues Found)

✓ Config Package - Excellent Validation Organization

The internal/config package demonstrates exemplary organization:

internal/config/
├── config.go              # Main config loading
├── validation.go          # Variable expansion & server validation
├── env_validation.go      # Environment/container validation
├── schema_validation.go   # JSON schema validation
└── rules/
    └── rules.go          # Centralized validation error constructors

Validation Functions (11 total):

  • ValidateExecutionEnvironment() - env_validation.go
  • ValidateContainerizedEnvironment() - env_validation.go
  • validateJSONSchema() - schema_validation.go
  • validateStringPatterns() - schema_validation.go
  • validateServerConfig() - validation.go
  • validateGatewayConfig() - validation.go
  • validateMounts() - validation.go
  • expandVariables() - validation.go
  • expandEnvVariables() - validation.go
  • Plus error constructors in rules/ package

Why This Works:

  • Clear separation of concerns
  • Each file has a single, well-defined purpose
  • Centralized error rules prevent duplication
  • Easy to locate validation logic by category

✓ Auth Package - Minimal and Focused

The internal/auth package is perfectly scoped:

// Only 2 functions (internal/auth/header.go)
func ParseAuthHeader(authHeader string) (apiKey string, agentID string, error error)
func ValidateAPIKey(provided, expected string) bool

Why This Works:

  • Single responsibility: authentication header parsing
  • No unnecessary abstractions
  • Used correctly by server/auth.go middleware

✓ DIFC Package - Clear Separation

The internal/difc package demonstrates good organization:

  • agent.go - Agent label management
  • capabilities.go - Capability sets
  • evaluator.go - Flow evaluation logic
  • labels.go - Label types and operations
  • resource.go - Resource labeling

Each file maps to a distinct concept in the DIFC security model.


✓ Guard Package - Interface + Implementations

The internal/guard package follows Go best practices:

  • guard.go - Interface definition
  • noop.go - No-op implementation
  • registry.go - Guard registration and lookup
  • context.go - Context helpers

Clean interface-based design with clear separation.


Detailed Function Clusters

Cluster 1: Validation Functions (11 functions)

Pattern: Functions that validate configuration, environment, or input data

Distribution:

  • internal/config/ - 8 validation functions (config, environment, JSON schema)
  • internal/auth/ - 1 validation function (API key validation)
  • internal/config/rules/ - Error constructor functions

Analysis:Well-organized - validation functions are appropriately distributed by domain. Config validation is in config package, auth validation is in auth package.


Cluster 2: Initialization Functions (9 functions)

Pattern: Functions that initialize subsystems or create instances

Distribution:

  • internal/logger/ - 3 logger init functions (InitFileLogger, InitJSONLLogger, InitMarkdownLogger)
  • internal/server/ - 3 server creation functions (New, NewUnified, CreateHTTPServerFor...)
  • internal/difc/ - Multiple New* constructor functions

Analysis: ⚠️ Logger init functions show repeated pattern (see Issue #2)


Cluster 3: Formatting Functions (15+ functions)

Pattern: Functions that format data for display or serialization

Distribution:

  • internal/logger/ - 8 format functions (formatRPCMessage, formatJSONWithoutFields, etc.)
  • internal/config/ - 3 format functions (formatSchemaError, formatValidationErrorRecursive, formatErrorContext)
  • internal/timeutil/ - 1 format function (FormatDuration)

Analysis:Appropriately specialized - Each format function serves a distinct purpose. Not duplicates.


Cluster 4: Sanitization Functions (5 functions)

Pattern: Functions that remove or redact sensitive data

Distribution:

  • internal/logger/sanitize/ - 2 primary functions (SanitizeString, SanitizeJSON)
  • internal/logger/ - 3 wrapper/helper functions

Analysis: ⚠️ Contains 2 deprecated wrappers (see Issue #1)


Refactoring Recommendations

Priority 1: High Impact, Low Effort

1.1 Remove Deprecated Sanitization Wrappers

Task: Remove sanitizePayload() and sanitizeSecrets() wrapper functions

Steps:

  1. Search for all calls to sanitizePayload() in codebase
  2. Replace with sanitize.SanitizeJSON() (add import if needed)
  3. Remove sanitizePayload() function from jsonl_logger.go
  4. Search for all calls to sanitizeSecrets()
  5. Replace with sanitize.SanitizeString() (add import if needed)
  6. Remove sanitizeSecrets() function from markdown_logger.go
  7. Run tests to verify no regressions

Estimated Effort: 1-2 hours
Benefits:

  • Reduced code size
  • Single source of truth
  • Eliminates confusion about which function to use

Implementation Checklist:

  • Find and replace sanitizePayload() calls
  • Find and replace sanitizeSecrets() calls
  • Remove deprecated functions
  • Update imports
  • Run full test suite
  • Update documentation if needed

Priority 2: Medium Impact, Medium Effort

2.1 Extract Common Server Creation Patterns

Task: Extract shared HTTP server setup logic into helper functions

Steps:

  1. Create helper function: setupOAuthDiscoveryHandler(mux *http.ServeMux)
  2. Create helper function: setupHealthEndpointWithAuth(mux *http.ServeMux, handler http.HandlerFunc, apiKey string)
  3. Create helper function: newHTTPServerWithTimeouts(addr string, handler http.Handler) *http.Server
  4. Refactor CreateHTTPServerForRoutedMode() to use helpers
  5. Refactor CreateHTTPServerForMCP() to use helpers
  6. Run integration tests

Estimated Effort: 3-4 hours
Benefits:

  • Reduced duplication
  • Easier to change common server configuration
  • Improved testability (helpers can be unit tested)

Implementation Checklist:

  • Design helper function signatures
  • Extract OAuth handler setup
  • Extract server creation logic
  • Refactor routed mode server creation
  • Refactor unified mode server creation
  • Run integration tests
  • Update documentation

2.2 Consolidate Logger Public APIs

Task: Improve clarity of logger public APIs

Option A (Documentation): Document when to use each logger

  • Add package-level documentation explaining FileLogger vs MarkdownLogger
  • Add usage examples to README
  • Effort: 2 hours

Option B (Refactoring): Create unified logger interface

  • Define Logger interface with Init/Close/Log methods
  • Use explicit logger instances instead of global functions
  • Effort: 6-8 hours

Recommendation: Start with Option A (documentation), consider Option B for future work


Priority 3: Long-Term Improvements

3.1 Logger Interface Abstraction

Task: Introduce interface-based abstraction for logger lifecycle

Benefits:

  • Reduced boilerplate across logger types
  • Improved testability (can mock logger interface)
  • Easier to add new logger types

Considerations:

  • Breaking change to existing logger API
  • Requires careful migration plan
  • May require Go 1.18+ generics for best implementation

Recommendation: Consider for major version upgrade (v2.0)

Estimated Effort: 8-12 hours


Implementation Checklist

Phase 1: Quick Wins (Week 1)

  • Review findings and prioritize based on team capacity
  • Create detailed refactoring plan for Priority 1 items
  • Remove deprecated sanitizePayload() wrapper function
  • Remove deprecated sanitizeSecrets() wrapper function
  • Update all callsites to use sanitize.SanitizeString/JSON() directly
  • Run full test suite to verify no regressions
  • Update any relevant documentation

Phase 2: Medium-Term Improvements (Weeks 2-3)

  • Extract common OAuth handler setup into helper function
  • Extract common HTTP server creation into helper function
  • Refactor CreateHTTPServerForRoutedMode() to use helpers
  • Refactor CreateHTTPServerForMCP() to use helpers
  • Add package-level documentation for logger usage
  • Run integration tests

Phase 3: Long-Term Considerations (Future)

  • Evaluate need for logger interface abstraction
  • Consider consolidating logger public APIs
  • Plan for potential v2.0 API improvements
  • Document architectural decisions

Analysis Metadata

  • Total Go Files Analyzed: 43 (excluding test files)
  • Total Functions Cataloged: ~300+
  • Function Clusters Identified: 4 major clusters (validation, initialization, formatting, sanitization)
  • Outliers Found: 0 (all functions appropriately placed)
  • Deprecated Functions Found: 2 (sanitizePayload, sanitizeSecrets)
  • Duplicate Patterns Detected: 3 (sanitization wrappers, init/close, server creation)
  • Well-Organized Packages: 5 (config, auth, difc, guard, testutil)
  • Detection Method: Manual semantic analysis + naming pattern analysis
  • Analysis Date: 2026-01-14
  • Analyzer: Semantic Function Clustering Agent
  • Repository: githubnext/gh-aw-mcpg
  • Branch: main
  • Workflow Run: §20995989604

Recommended Next Steps

  1. Immediate Action (Week 1): Remove deprecated sanitization wrapper functions
  2. Short-Term (Weeks 2-3): Extract common server creation patterns
  3. Long-Term: Document logger usage patterns and consider interface abstraction

The analysis reveals that while the codebase is generally well-organized, the internal/logger package would benefit most from targeted refactoring. The config, auth, difc, and guard packages demonstrate excellent organization and should be used as models for future development.

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions