Skip to content

Conversation

@icecrasher321
Copy link
Collaborator

@icecrasher321 icecrasher321 commented Dec 10, 2025

Summary

Cleanup access pattern across codebase for redis client.

  1. Should not fallback to db for transient failures -- creates non-deterministic edge cases that are hard to reason about.
  2. Acquire/Release Lock -- process B should not accidentally delete lock owned by process A. requestId as identifier was in place but not used.
  3. Remove dead code.

Type of Change

  • Bug fix
  • Other: Reliability

Testing

Tested with @waleedlatif1

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Dec 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
docs Skipped Skipped Dec 10, 2025 8:28pm

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 10, 2025

Greptile Overview

Greptile Summary

This PR implements a comprehensive cleanup of Redis access patterns across the codebase to eliminate non-deterministic behavior and fix distributed locking issues. The main changes include:

  1. Storage Method Abstraction: Introduced a new storage abstraction layer (apps/sim/lib/core/storage/storage.ts) that determines storage method (Redis vs database) once at startup and caches the decision, preventing fallback behavior during transient failures.

  2. Distributed Lock Ownership Fix: Fixed critical race conditions in polling endpoints where Process B could accidentally release locks owned by Process A by implementing proper ownership verification using requestId as unique identifiers.

  3. Elimination of Fallback Behavior: Removed try-catch fallback patterns that would silently switch from Redis to database/in-memory storage during transient failures, replacing them with fail-fast error handling.

  4. Code Cleanup: Removed dead code including hacky global cache access patterns and replaced them with dedicated storage implementations.

The changes ensure predictable system behavior by making storage decisions at configuration time rather than runtime, improving reliability in distributed environments.

Important Files Changed

Filename Score Overview
apps/sim/lib/core/storage/storage.ts 5/5 New storage abstraction layer that determines Redis vs database once at startup, eliminating fallback behavior
apps/sim/lib/core/config/redis.ts 4/5 Major refactor removing in-memory fallback cache, adding Lua script for safe lock release with ownership verification
apps/sim/lib/core/idempotency/service.ts 5/5 Refactored to use storage abstraction, removed try-catch fallback behavior for deterministic storage selection
apps/sim/lib/knowledge/documents/queue.ts 4/5 Updated to use storage method abstraction, eliminated runtime fallback switching for predictable behavior
apps/sim/lib/knowledge/documents/service.ts 5/5 Migrated from direct Redis client access to storage abstraction, simplified queue initialization logic
apps/sim/app/api/chat/[identifier]/otp/route.ts 5/5 Replaced hacky global cache access with dedicated Map-based OTP store, uses storage abstraction
apps/sim/app/api/webhooks/poll/gmail/route.ts 5/5 Fixed distributed lock ownership by using requestId as lock value and verifying ownership on release
apps/sim/app/api/webhooks/poll/outlook/route.ts 5/5 Implemented lock ownership verification to prevent accidental lock deletion by concurrent processes
apps/sim/app/api/webhooks/poll/rss/route.ts 5/5 Added lock ownership tracking and verification using requestId to ensure safe lock release
apps/sim/app/api/notifications/poll/route.ts 5/5 Fixed lock release pattern with ownership verification and proper tracking of lock acquisition
apps/sim/lib/core/rate-limiter/storage/factory.ts 5/5 Updated to use storage method abstraction, added strict error handling when Redis configured but unavailable
apps/sim/lib/core/rate-limiter/storage/index.ts 5/5 Added getAdapterType export for external access to storage adapter type information
apps/sim/lib/core/storage/index.ts 5/5 Simple barrel export file providing clean API for new storage abstraction utilities

Confidence score: 4/5

  • This PR significantly improves system reliability by eliminating non-deterministic behavior and fixing critical distributed locking issues
  • Score reflects the complexity of changes across multiple core files and potential impact on system behavior, though changes follow consistent patterns
  • Pay close attention to apps/sim/lib/core/config/redis.ts and apps/sim/lib/knowledge/documents/queue.ts for their extensive refactoring and error handling changes

Sequence Diagram

sequenceDiagram
    participant User
    participant NotificationPollAPI as "GET /api/notifications/poll"
    participant CronAuth as "verifyCronAuth"
    participant Redis as "Redis Lock Service"
    participant InactivityPoller as "pollInactivityAlerts"

    User->>NotificationPollAPI: "GET /api/notifications/poll"
    NotificationPollAPI->>CronAuth: "verifyCronAuth(request)"
    alt Auth Failed
        CronAuth-->>NotificationPollAPI: "Return auth error"
        NotificationPollAPI-->>User: "Return error response"
    end
    
    NotificationPollAPI->>Redis: "acquireLock(LOCK_KEY, requestId, 120)"
    alt Lock Already Held
        Redis-->>NotificationPollAPI: "false (lock not acquired)"
        NotificationPollAPI-->>User: "Return 202 - Polling already in progress"
    else Lock Acquired
        Redis-->>NotificationPollAPI: "true (lock acquired)"
        NotificationPollAPI->>InactivityPoller: "pollInactivityAlerts()"
        InactivityPoller-->>NotificationPollAPI: "Return polling results"
        NotificationPollAPI-->>User: "Return 200 - Polling completed"
    end
    
    NotificationPollAPI->>Redis: "releaseLock(LOCK_KEY, requestId)"
    Redis-->>NotificationPollAPI: "Lock released"
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1 waleedlatif1 merged commit 1cfe229 into staging Dec 10, 2025
9 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/cleanup-redis-access-pattern branch December 10, 2025 20:37
waleedlatif1 added a commit that referenced this pull request Dec 10, 2025
* feat(folders): add the ability to create a folder within a folder in popover (#2287)

* fix(agent): filter out empty params to ensure LLM can set tool params at runtime (#2288)

* fix(mcp): added backfill effect to add missing descriptions for mcp tools (#2290)

* fix(redis): cleanup access pattern across callsites (#2289)

* fix(redis): cleanup access pattern across callsites

* swap redis command to be non blocking

* improvement(log-details): polling, trace spans (#2292)

---------

Co-authored-by: Vikhyath Mondreti <vikhyathvikku@gmail.com>
Co-authored-by: Emir Karabeg <78010029+emir-karabeg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants