Skip to content

Conversation

Mat001
Copy link
Contributor

@Mat001 Mat001 commented Sep 23, 2025

Summary

  • Add redis-streams functionality to agent per CapitalOne
  • required becasue current pubsub/redis fire-and-forget no longer working for the customer.
  • Customer has the need to batch decision notifications and should not be losing notifications when agent crashing etc

Issues

@Mat001 Mat001 self-assigned this Sep 23, 2025
- Add RedisStreams implementation with retry logic and error handling
- Support configurable batch processing and connection resilience
- Add comprehensive unit tests for functionality and error scenarios
- Update configuration to support both "redis" and "redis-streams" options
- Maintain backwards compatibility with existing Redis pub/sub
- Add configuration parameters for tuning Redis Streams behavior
@Mat001 Mat001 force-pushed the redis-streams-notifications branch from 4aa9496 to 61e5e7e Compare September 26, 2025 18:03
Standardize password field handling for all Redis instances (pub/sub, UPS, ODP)
to support multiple field names and environment variable fallback:
- auth_token (recommended - avoids security scanner alerts)
- redis_secret (alternative)
- password (legacy support)
- Environment variable fallback (REDIS_PASSWORD, REDIS_UPS_PASSWORD, REDIS_ODP_PASSWORD)

Changes:
- Add pkg/utils/redisauth package with GetPassword() utility
- Update Redis Streams tests to use Redis 6 in CI (required for Streams support)
- Fix DNS lookup error assertion in invalid host test
- Update config.yaml with security-friendly password field examples
- Document Redis 5.0+ requirement in redis-streams.md
- Fix indentation in test assertions
- Align map literal values in tests
- Sort imports alphabetically per Go convention
This commit addresses two critical issues found during integration testing:

1. Race condition in Subscribe() method:
   - Subscribe() was returning immediately while consumer group creation
     happened asynchronously in a goroutine
   - This caused NOGROUP errors when messages were published before the
     consumer group was fully initialized
   - Fixed by adding a synchronization channel (ready) that waits for
     consumer group creation to complete before returning

2. Consumer group creation using wrong client:
   - createConsumerGroupWithRetry() was ignoring the passed client parameter
   - It created temporary clients via executeWithRetry() which were immediately closed
   - Fixed to use the persistent client from Subscribe() goroutine

3. Test timeout adjustments:
   - Both Subscribe tests now wait 6 seconds (longer than 5s flush interval)
   - Previously they waited exactly at or below flush interval causing flakiness
   - Tests now pass reliably 10/10 runs

All Redis Streams tests now pass consistently.
@Mat001 Mat001 marked this pull request as ready for review October 3, 2025 19:32
@pvcraven pvcraven requested a review from Copilot October 3, 2025 19:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds Redis Streams functionality to the agent per CapitalOne requirements, addressing limitations with the current Redis pub/sub fire-and-forget approach where notifications could be lost during agent crashes. The implementation provides persistent message delivery with guaranteed delivery, message acknowledgment, and automatic recovery.

  • Introduces Redis Streams as an alternative to Redis pub/sub for reliable notification delivery
  • Implements batching, retry logic, and connection resilience features
  • Updates authentication configuration to support multiple field names and environment variable fallback

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/syncer/pubsub/redis_streams.go Core Redis Streams implementation with batching, retry logic, and error handling
pkg/syncer/pubsub/redis_streams_test.go Unit tests for Redis Streams functionality
pkg/syncer/pubsub/redis_streams_error_test.go Error handling and retry logic tests
pkg/syncer/pubsub.go Integration with existing pubsub system and configuration parsing
pkg/syncer/pubsub_test.go Updated tests for new pubsub configurations
pkg/utils/redisauth/password.go Utility for flexible Redis password configuration
pkg/utils/redisauth/password_test.go Tests for Redis authentication utility
plugins/userprofileservice/services/redis_ups.go Updated to use new authentication utility
plugins/odpcache/services/redis_cache.go Updated to use new authentication utility
docs/redis-streams.md Comprehensive documentation for Redis Streams feature
config.yaml Updated configuration with Redis Streams settings
.github/workflows/agent.yml CI updates to use Redis 6 for testing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Mat001 added 2 commits October 3, 2025 14:08
1. Fix consumer name collision risk (Line R268):
   - Changed from time.Now().UnixNano() to hostname+pid+timestamp
   - Reduces collision risk in concurrent scenarios
   - Format: consumer-{hostname}-{pid}-{timestamp}
   - Each Agent process gets a unique, deterministic consumer name

2. Fix resource leak in executeWithRetry (Line R339):
   - Wrap operation with defer client.Close() in anonymous function
   - Ensures Redis client is always closed, even on panic
   - Prevents connection pool exhaustion in failure scenarios
   - Critical for production stability

3. Race condition (Line R103):
   - Already addressed in commit aa90732 with ready channel synchronization
1. Fix YAML/JSON database type assertion (Line R139):
   - YAML/JSON unmarshals numeric values as float64, not int
   - Added type switch to handle both int and float64
   - Prevents failure on valid config like "database: 0"
   - Applied to both getPubSubRedis and getPubSubRedisStreams

2. Fix typo: SycnFeatureFlag → SyncFeatureFlag
   - Corrected spelling throughout codebase
   - Fixed in pubsub.go, pubsub_test.go, and syncer.go

Note on Line R113 comment:
   - Reviewer suggested PubSubRedisStreams instead of PubSubRedis
   - Current code is correct - both redis and redis-streams implementations
     intentionally share the same config section "pubsub.redis"
   - This allows common connection settings while supporting implementation-specific
     parameters (batch_size, flush_interval, etc.)
@Mat001 Mat001 requested a review from pvcraven October 3, 2025 21:28
Coverage improvements for recent PR review fixes:

1. pkg/syncer/pubsub_test.go:
   - Test database type conversion (int vs float64 from YAML/JSON)
   - Covers the type switch added in getPubSubRedis/getPubSubRedisStreams
   - Tests valid types (int, float64) and invalid types (string, nil)

2. plugins/odpcache/services/redis_cache_test.go:
   - Test RedisCache.UnmarshalJSON method
   - Verifies password field priority: auth_token > redis_secret > password
   - Tests empty password handling and invalid JSON

3. plugins/userprofileservice/services/redis_ups_test.go:
   - Test RedisUserProfileService.UnmarshalJSON method
   - Verifies password field priority: auth_token > redis_secret > password
   - Tests empty password handling and invalid JSON

These tests cover the previously uncovered code paths from the flexible
Redis password configuration implementation.
@Mat001 Mat001 requested a review from Copilot October 3, 2025 21:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Mat001 added 4 commits October 5, 2025 23:04
Coverage improvements for redis-streams error paths:
- Test config not found
- Test config not valid (not a map)
- Test host not found
- Test host not valid (not a string)
- Test database not found
- Test database as float64 (valid YAML/JSON case)
- Test database invalid type
- Test datafile with unsupported pubsub type

This addresses uncovered lines 66, 121-122, 126-127, 131-132, 135-136,
143-144, 150-151, 152-153 in pkg/syncer/pubsub.go
Address reviewer feedback on race condition that was only partially fixed:

Problem:
- If context is cancelled while goroutine is initializing, the main
  function returns via ctx.Done() case but goroutine continues running
- Goroutine could block trying to send to ready channel if no receiver
- This creates a goroutine leak

Solution:
- Wrap both ready channel sends in select statements
- Check ctx.Done() before sending to ready channel
- If main function already returned, goroutine exits immediately
- Prevents goroutine from blocking on channel send

This ensures proper cleanup when Subscribe() caller cancels the context
during the initialization phase.
Address comprehensive goroutine lifecycle management issue:

Problem:
The previous fix with buffered channel only prevented blocking, but didn't
prevent goroutine leaks. When the main function returns with an error or
due to context cancellation BEFORE the goroutine enters its main loop, the
goroutine would continue running indefinitely until the original context
expires (could be much later).

Complete Solution:
1. Added stop channel (chan struct{}) to signal goroutine termination
2. Goroutine checks stop channel during initialization (both error and success paths)
3. Goroutine checks stop channel as first case in main select loop
4. Main function closes stop channel when returning with error or ctx.Done()
5. Main function does NOT close stop on success - goroutine continues normally

This ensures:
- No goroutine leaks when Subscribe returns early with error
- No goroutine leaks when context is cancelled during initialization
- Clean resource cleanup via defer statements
- Goroutine runs normally when initialization succeeds

Verified with race detector running 3 consecutive times.
Copy link

@pvcraven pvcraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks for the conversation and updates.

Mat001 added 2 commits October 6, 2025 13:19
Add beta release notes for Redis Streams feature including:
- Redis Streams persistent notification delivery
- Flexible Redis password configuration

PR: #444
@Mat001 Mat001 changed the title [FSSDK-] Add redis-streams [FSSDK-] Add redis-streams **DO NOT MERGE - IT'S PRE-RELASE FROM FEATURE BRANCH** Oct 6, 2025
@Mat001 Mat001 changed the title [FSSDK-] Add redis-streams **DO NOT MERGE - IT'S PRE-RELASE FROM FEATURE BRANCH** [FSSDK-11923] Add redis-streams **DO NOT MERGE - IT'S PRE-RELASE FROM FEATURE BRANCH** Oct 6, 2025
- Upgrade go-sdk from v2.0.0 to v2.1.1 (latest master)
- Replace cmab.Config with client.CmabConfig
- Remove RetryConfig parsing (now handled internally by go-sdk)
- Simplify CMAB configuration to use stable public API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants