Skip to content

BUG: Critical Scalability Flaw: In-Memory Session Storage Blocks Horizontal Scaling #254

@KumarADITHYA123

Description

@KumarADITHYA123

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Description: The authentication verification service currently relies on a global in-memory Python dictionary (_verification_sessions) to store active user sessions. This stateful architecture creates a critical blocker for production deployments that require horizontal scaling.

Location:
backend/app/services/auth/verification.py
(Line 11)

Technical Analysis: The application uses a module-level global variable for session management:

session_id -> (discord_id, expiry_time)

_verification_sessions: Dict[str, Tuple[str, datetime]] = {}
In a production environment utilizing WSGI/ASGI servers with multiple workers (e.g., gunicorn -w 4) or a container orchestration system (Kubernetes/Docker Swarm) with multiple replicas, this memory space is not shared.

The Failure Scenario:

User A initiates a verification flow and is handled by Worker 1, which stores the session in its local memory.
User A completes the OAuth flow and is redirected back to the callback endpoint.
The load balancer routes the callback request to Worker 2.
Worker 2 checks its own local memory, finds no record of the session, and rejects the request with "Session not found".
Impact:

Severity: Critical
Scalability: The application is forced to run as a single instance.
Reliability: Verification flows will fail intermittently in any multi-worker environment.
Proposed Solution: Refactor the session management to use a distributed cache store such as Redis.

Replace _verification_sessions with a Redis client wrapper.
Implement key-value storage with TTL (Time To Live) to handle expiry automatically (replacing the manual
_cleanup_expired_sessions
loop).
Ensure the Redis connection is initialized via a centralized dependency (app.core.redis).
Action Plan: I will invoke a PR to implement the Redis-based solution described above.

Related Issues:

PR #174 (Rate Limiting via Redis) - establishes a pattern but does not address this session storage flaw.
Issue #15 - Closed without resolution.

Record

  • I want to work on this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions