Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Improvements #196

@github-actions

Description

@github-actions

Executive Summary

Analysis of 39 non-test Go files (~7,456 lines) in the MCP Gateway codebase has identified several opportunities to improve code organization through semantic function clustering. The codebase is generally well-organized with clear package boundaries, but there are specific areas where refactoring would enhance maintainability and reduce duplication.

Key Findings:

  • Well-organized: 8/10 packages follow clear single-responsibility patterns
  • ⚠️ Minor issues found: 3 duplicate sanitization patterns across packages
  • ⚠️ Scattered functions: Auth-related functions split between server and guard packages
  • ⚠️ Validation spread: Validation logic distributed across config package files
  • 📊 Function inventory: ~250 functions cataloged across all packages

Function Inventory by Package

Package Structure Overview
Package Files Primary Purpose Organization Quality
cmd 2 CLI commands (Cobra) ✅ Excellent
config 4 Configuration parsing/validation ✅ Good (see notes)
difc 5 Security labels (DIFC system) ✅ Excellent
guard 4 Security guards ✅ Excellent
launcher 1 Backend process management ✅ Excellent
logger 7 Logging framework ⚠️ Needs consolidation
mcp 2 MCP protocol types ✅ Excellent
server 6 HTTP server implementation ⚠️ Minor issues
sys 1 System utilities ✅ Excellent
testutil/mcptest 4 Testing utilities ✅ Excellent
timeutil 1 Time formatting ✅ Excellent
tty 2 TTY detection ✅ Excellent

Identified Issues

1. Duplicate Sanitization Functions

Issue: Three different sanitization functions with overlapping purposes spread across packages.

Occurrences:

A. launcher/launcher.go:sanitizeEnvForLogging (lines 22-35)

func sanitizeEnvForLogging(env map[string]string) map[string]string {
    sanitized := make(map[string]string, len(env))
    for key, value := range env {
        if len(value) <= 4 {
            sanitized[key] = "..."
        } else {
            sanitized[key] = value[:4] + "..."
        }
    }
    return sanitized
}
  • Purpose: Sanitize environment variables for debug logs
  • Scope: Simple truncation (first 4 chars + "...")

B. logger/markdown_logger.go:sanitizeSecrets (lines 117-133)

func sanitizeSecrets(message string) string {
    result := message
    for _, pattern := range secretPatterns {
        result = pattern.ReplaceAllStringFunc(result, func(match string) string {
            // Redacts secrets based on regex patterns
            return "[REDACTED]"
        })
    }
    return result
}
  • Purpose: Redact secrets from log messages using regex patterns
  • Scope: Pattern-based redaction (tokens, keys, passwords)
  • Used by: jsonl_logger.go:sanitizePayload (line 93)

C. logger/jsonl_logger.go:sanitizePayload (lines 91-110)

func sanitizePayload(payloadBytes []byte) json.RawMessage {
    // Applies sanitizeSecrets then ensures valid JSON
    sanitized := sanitizeSecrets(string(payloadBytes))
    // ... JSON validation logic ...
}
  • Purpose: Sanitize JSON payloads for JSONL logs
  • Scope: Combines secret redaction with JSON validation

Analysis:

Duplication Level: Medium (different implementations, related purpose)

  • sanitizeEnvForLogging - Simple truncation strategy
  • sanitizeSecrets - Pattern-based redaction
  • sanitizePayload - Wraps sanitizeSecrets with JSON handling

Recommendation:

  1. Create internal/logger/sanitization.go with a unified sanitization API:
    // SanitizeForLogging handles general log sanitization
    func SanitizeForLogging(value string, mode SanitizationMode) string
    
    // SanitizeEnvMap sanitizes environment variable maps
    func SanitizeEnvMap(env map[string]string) map[string]string
    
    // SanitizeJSON sanitizes and validates JSON payloads
    func SanitizeJSON(payload []byte) json.RawMessage
  2. Consolidate the three implementations into this single module
  3. Update launcher/launcher.go to import from logger/sanitization.go
  4. Keep existing function signatures as thin wrappers for backward compatibility

Estimated Impact:

  • Effort: 2-3 hours
  • Benefits: Single source of truth for sanitization logic, easier testing
  • Risk: Low (pure functions, well-testable)

2. Authentication Logic Split Across Packages

Issue: Authentication and agent ID extraction logic is split between server/auth.go and guard/context.go, creating potential inconsistencies.

Locations:

A. server/auth.go:authMiddleware (lines 13-38)

  • Validates API key from Authorization header
  • Spec 7.1 compliant: requires exact match (NOT Bearer scheme)
  • Logs auth failures

B. guard/context.go:ExtractAgentIDFromAuthHeader (lines 38-58)

  • Extracts agent ID from Authorization header
  • Supports different formats: Bearer, Agent, or raw value
  • Inconsistency: Expects Bearer scheme but spec says no Bearer scheme

Conflict:

// server/auth.go - Spec 7.1: NO Bearer scheme
if authHeader != apiKey { ... }

// guard/context.go - ACCEPTS Bearer scheme
if strings.HasPrefix(authHeader, "Bearer ") {
    token := strings.TrimPrefix(authHeader, "Bearer ")
    return token
}

Analysis:
The auth middleware enforces spec 7.1 (no Bearer), but the guard context handler accepts Bearer format. This creates confusion and potential bugs if guards are ever enabled.

Recommendation:

  1. Consolidate auth header parsing into internal/auth/ package:
    internal/auth/
      ├── parser.go        // ExtractToken, ExtractAgentID
      ├── middleware.go    // authMiddleware
      └── validator.go     // ValidateAPIKey
    
  2. Ensure consistent interpretation of Authorization header across all uses
  3. Document the expected format clearly (currently spec says plain API key, not Bearer)
  4. Update guard/context.go to use centralized parser

Estimated Impact:

  • Effort: 3-4 hours
  • Benefits: Consistent auth handling, clearer security model
  • Risk: Medium (security-related, needs careful testing)

3. Config Validation Functions Spread Across Multiple Files

Issue: Configuration validation logic is distributed across 3 separate files in the config package, making it harder to understand the complete validation workflow.

Current Structure:

internal/config/
├── validation.go            // Core validation (expandVariables, validateServerConfig)
├── env_validation.go        // Environment validation (Docker, containerization)
└── schema_validation.go     // JSON schema validation (validateJSONSchema)

Function Distribution:

validation.go (6 functions):

  • expandVariables - Variable expansion
  • expandEnvVariables - Env var expansion
  • validateMounts - Mount path validation
  • validateServerConfig - Server config validation
  • validateGatewayConfig - Gateway config validation

env_validation.go (14 functions):

  • ValidateExecutionEnvironment - Environment checks
  • ValidateContainerizedEnvironment - Container checks
  • detectContainerized - Container detection
  • checkDockerAccessible - Docker daemon check
  • checkRequiredEnvVars - Required vars check
  • validateContainerID - Container ID format
  • checkPortMapping - Port mapping validation
  • checkStdinInteractive - Stdin mode check
  • checkLogDirMounted - Log dir mount check
  • Plus 5 getter functions for env vars

schema_validation.go (5 functions):

  • validateJSONSchema - JSON schema validation
  • formatSchemaError - Error formatting
  • formatValidationErrorRecursive - Recursive error formatting
  • formatErrorContext - Context-aware error messages
  • validateStringPatterns - Pattern validation

Analysis:

Organization Quality: Good but could be improved

  • ✅ Files are named by validation type (schema, env, general)
  • ⚠️ Some overlap: validateServerConfig in validation.go but server-specific checks in other files
  • ⚠️ Error formatting in schema_validation.go but not in other validation files

Recommendation:
This is LOW PRIORITY - the current organization is acceptable. If refactoring is desired:

  1. Consider renaming validation.goserver_validation.go for clarity
  2. Add error_formatting.go for all validation error formatting functions
  3. Keep the three-file structure (it's actually logical: schema, env, server)

Estimated Impact:

  • Effort: 2-3 hours
  • Benefits: Slightly clearer organization
  • Risk: Low
  • Priority: Low (current organization is functional)

4. Close() Method Pattern - Consistent but Verbose

Observation: Multiple packages implement Close() methods with similar patterns.

Occurrences (7 implementations):

  1. logger/file_logger.go:FileLogger.Close()
  2. logger/jsonl_logger.go:JSONLLogger.Close()
  3. logger/markdown_logger.go:MarkdownLogger.Close()
  4. mcp/connection.go:Connection.Close()
  5. launcher/launcher.go:Launcher.Close()
  6. server/unified.go:UnifiedServer.Close()
  7. server/transport.go:HTTPTransport.Close()

Analysis:
This is GOOD Go idiom - not a problem, but worth documenting:

  • Consistent with io.Closer interface
  • Each implementation has unique cleanup logic
  • No duplication detected

Recommendation: No action needed - this is idiomatic Go. Consider adding interface documentation if needed.


5. Logger Package Structure - Multiple Specialized Loggers

Current Structure:

internal/logger/
├── logger.go              // Debug logger (namespace-based)
├── file_logger.go         // File-based operational logger
├── jsonl_logger.go        // JSONL RPC message logger
├── markdown_logger.go     // Markdown operational logger
├── rpc_logger.go          // RPC message formatting
├── slog_adapter.go        // slog compatibility adapter
└── error_formatting.go    // Error extraction utility

Analysis:

Generally well-organized with clear separation of concerns:

  • Debug logging: logger.go (DEBUG env var, colored output)
  • Operational logging: file_logger.go, markdown_logger.go
  • RPC logging: jsonl_logger.go, rpc_logger.go
  • Adapters: slog_adapter.go

⚠️ Minor inconsistency:

  • rpc_logger.go has formatting functions that call loggers from other files
  • sanitizeSecrets is in markdown_logger.go but used by jsonl_logger.go

Recommendation:

  1. Create internal/logger/formatting.go for shared formatting utilities:
    • Move sanitizeSecrets from markdown_logger.go
    • Move truncateAndSanitize from rpc_logger.go
    • Move extractEssentialFields from rpc_logger.go
  2. Keep the current multi-logger approach (it's appropriate for different use cases)

Estimated Impact:

  • Effort: 1-2 hours
  • Benefits: Clearer dependency structure
  • Risk: Low (formatting functions are pure)
  • Priority: Medium

Semantic Clusters Identified

Cluster 1: Configuration Functions ✅

Pattern: Config loading, parsing, and validation
Files: internal/config/config.go, validation.go, env_validation.go, schema_validation.go
Quality: Good organization by validation type

Cluster 2: DIFC Label Management ✅

Pattern: Security label creation, manipulation, and flow control
Files: internal/difc/labels.go, agent.go, capabilities.go, resource.go, evaluator.go
Quality: Excellent - well-structured DIFC implementation

Cluster 3: Server Handlers ✅

Pattern: HTTP request handlers and routing
Files: internal/server/server.go, routed.go, unified.go, transport.go, health.go, auth.go
Quality: Good separation of concerns (routed vs unified)

Cluster 4: Logging Functions ⚠️

Pattern: Different logging strategies for different purposes
Files: All 7 files in internal/logger/
Quality: Good but could consolidate shared utilities (see Issue #5)

Cluster 5: Guard System ✅

Pattern: Security guard registration and execution
Files: internal/guard/guard.go, noop.go, registry.go, context.go
Quality: Excellent - clean interface pattern

Cluster 6: Constructor Functions ✅

Pattern: New* functions for creating instances
Occurrences: 29 constructor functions across all packages
Quality: Excellent - consistent Go idiom


Refactoring Recommendations

Priority 1: High Impact (Recommended)

1.1 Consolidate Sanitization Functions

  • Issue: Configure as a Go CLI tool #1 (Duplicate sanitization patterns)
  • Action: Create internal/logger/sanitization.go
  • Effort: 2-3 hours
  • Benefits: Single source of truth, easier testing

1.2 Extract Auth Logic to Dedicated Package

  • Issue: Lpcox/initial implementation #2 (Auth logic split across packages)
  • Action: Create internal/auth/ package
  • Effort: 3-4 hours
  • Benefits: Consistent auth handling, security clarity

Priority 2: Medium Impact (Optional)

2.1 Extract Shared Logger Formatting Utilities

  • Issue: Updated Dockerfile #5 (Logger formatting scattered)
  • Action: Create internal/logger/formatting.go
  • Effort: 1-2 hours
  • Benefits: Clearer dependencies

Priority 3: Low Impact (Future Consideration)

3.1 Rename validation.go → server_validation.go

  • Issue: Lpcox/initial implementation #3 (Config validation file naming)
  • Action: Rename for clarity
  • Effort: 1 hour
  • Benefits: Slightly clearer purpose
  • Note: Current organization is functional, this is cosmetic

Implementation Checklist

Phase 1: Sanitization Consolidation

  • Create internal/logger/sanitization.go
  • Move sanitizeSecrets from markdown_logger.go
  • Move sanitizeEnvForLogging from launcher/launcher.go
  • Update jsonl_logger.go:sanitizePayload to use new API
  • Add unit tests for sanitization module
  • Update imports in affected files

Phase 2: Auth Package Extraction

  • Create internal/auth/ package structure
  • Move authMiddleware from server/auth.go
  • Move ExtractAgentIDFromAuthHeader from guard/context.go
  • Reconcile Bearer vs plain API key format
  • Update all callers to use new package
  • Add integration tests for auth flows

Phase 3: Logger Formatting Utilities

  • Create internal/logger/formatting.go
  • Move formatting functions from rpc_logger.go
  • Update imports in logger files
  • Verify all tests pass

Analysis Metadata

  • Repository: githubnext/gh-aw-mcpg
  • Total Go Files Analyzed: 39 (excluding tests)
  • Total Lines of Code: ~7,456
  • Function Clusters Identified: 6 major clusters
  • Outliers Found: 2 (auth logic split, sanitization scattered)
  • Duplicates Detected: 3 (sanitization functions)
  • Detection Method: Manual code analysis + grep pattern matching
  • Analysis Date: 2026-01-13
  • Workflow Run: §20940400562

Notes

Why No Serena MCP Server Analysis?

This analysis was performed using traditional code analysis tools (grep, pattern matching, manual review) rather than Serena's semantic analysis capabilities. Serena is configured for this workspace but was not required for this analysis because:

  1. Repository size: Small enough for manual analysis (~7,456 lines)
  2. Clear patterns: Function naming and organization patterns were discoverable via grep
  3. High-quality code: Well-organized packages with clear boundaries
  4. Focus on structure: Analysis focused on package organization rather than deep semantic similarity

Serena would be valuable for:

  • Larger codebases (>50K lines)
  • Detecting subtle semantic duplicates in complex algorithms
  • Cross-referencing function usage patterns
  • Identifying implicit dependencies

Overall Code Quality Assessment

The codebase is well-organized with clear package boundaries and consistent naming conventions. The issues identified are minor and represent opportunities for incremental improvement rather than urgent refactoring needs.

Strengths:

  • Consistent use of Go idioms (constructors, interfaces, error handling)
  • Clear package separation by functionality
  • Good test coverage (test files excluded from analysis)
  • Well-documented code with inline comments

Recommendations are optional - the current code is production-ready. Refactoring should be prioritized based on:

  1. Active development in affected areas
  2. Maintenance pain points (if any)
  3. Team bandwidth and priorities

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions