-
Notifications
You must be signed in to change notification settings - Fork 1
Resilient event watcher #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ensures the event watcher recovers from exceptions and restarts, preventing service interruption. Adds logging for watcher errors to improve debugging and monitoring.
WalkthroughThe pull request implements significant changes to the event watching mechanism in the Changes
Sequence Diagram(s)sequenceDiagram
participant W as EventWatcher
participant C as K8s Client
participant L as Logger
loop While _isRunning and not cancelled
W->>C: ListAsync<T>(LabelSelector, cancellationToken)
alt Successful response
C-->>W: Event data
W->>W: Process event via OnEvent(event)
else Exception encountered
W->>W: Catch exception
W->>L: WatcherError(exception.Message)
W->>W: Pause for 5 seconds before retry
end
end
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/K8sOperator.NET/Extensions/LoggingExtensions.cs (1)
184-189: Consider using a more appropriate log level for watcher errors.The logging call is at the
Informationlevel, but it logs an “Error” message. It may be more conventional and consistent to log watcher errors asLogLevel.Errorto better reflect the severity of the issue.Example refactor:
[LoggerMessage( - EventId = 28, - Level = LogLevel.Information, - Message = "Watcher Error {message}" -)] + EventId = 28, + Level = LogLevel.Error, + Message = "Watcher Error {message}" +)] public static partial void WatcherError(this ILogger logger, string message);src/K8sOperator.NET/EventWatcher.cs (2)
56-70: Log the actual exception in the catch block for improved debugging.Catching
Exceptionwithout logging the exception details can hamper root cause analysis. Logging only a generic message makes it difficult to troubleshoot. Consider incorporating at leastex.Messageor the exception stack trace in the log for clarity.Example refactor:
catch (Exception ex) { - Logger.WatcherError("Error in watcher loop restarting..."); + Logger.WatcherError($"Error in watcher loop restarting: {ex}"); await Task.Delay(TimeSpan.FromSeconds(5), cancellationToken); }
242-246: Log additional exception details, if appropriate.Calling
Logger.WatcherError(exception.Message);hides stack trace information that may be useful when debugging. If it is acceptable to log more details, consider logging either the entire exception, or at leastStackTrace.Example refactor:
private void OnError(Exception exception) { if (_isRunning) { - Logger.WatcherError(exception.Message); + Logger.WatcherError($"Exception: {exception.Message}\n{exception.StackTrace}"); } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/K8sOperator.NET/EventWatcher.cs(3 hunks)src/K8sOperator.NET/Extensions/LoggingExtensions.cs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze
- GitHub Check: Build
🔇 Additional comments (1)
src/K8sOperator.NET/EventWatcher.cs (1)
90-90: No functional changes detected.This line appears to be a closing brace without any modifications to logic. No concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/K8sOperator.NET/EventWatcher.cs (2)
58-66: Consider adding a break condition for extreme failure scenariosWhile the error handling is comprehensive, there's no limit to how many times the watcher will retry after encountering errors. In extreme scenarios, this could lead to continuous retries and logging without making progress.
Consider implementing a failure counter or backoff mechanism to prevent infinite retries:
while (_isRunning && !_cancellationToken.IsCancellationRequested) { try { var response = Client.ListAsync<T>(LabelSelector, cancellationToken); await foreach (var (type, item) in response.WatchAsync<T, object>(OnError, cancellationToken)) { OnEvent(type, item); } + // Reset any failure counter here if implemented }
255-255: Consider keeping stack trace information for debuggingWhile logging just the exception message simplifies the logs, it loses valuable stack trace information that could be helpful for debugging complex issues.
Consider adding an option to log the full exception details for deeper debugging scenarios:
-Logger.WatcherError(exception.Message); +Logger.WatcherError($"{exception.Message} - See logs for details."); +Logger.LogDebug(exception, "Watcher error details");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/K8sOperator.NET/EventWatcher.cs(3 hunks)test/K8sOperator.NET.Tests/EventWatcherTests.cs(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
test/K8sOperator.NET.Tests/EventWatcherTests.cs (2)
test/K8sOperator.NET.Tests/Logging/TestOutputLoggingExtensions.cs (1)
ILoggerFactory(55-72)test/K8sOperator.NET.Tests/Logging/TestOutputLoggerProvider.cs (1)
ILogger(59-59)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze
🔇 Additional comments (10)
test/K8sOperator.NET.Tests/EventWatcherTests.cs (6)
55-55: Good refactoring: Adding a centralized cancellation token sourceAdding a class-level
CancellationTokenSourceis a good practice that reduces code duplication across test methods.
62-63: Good practice: Using a 2-second timeout for testsSetting a 2-second timeout for the cancellation token is appropriate for test methods to prevent them from hanging indefinitely if something goes wrong.
76-77: Improved consistency: Using the centralized tokenReplacing inline token creation with the class-level token source improves consistency and maintainability.
95-95: Consistent token usage across test methodsGood consistency by using the same token source pattern across all test methods.
117-117: Consistent token usage across test methodsGood consistency by using the same token source pattern across all test methods.
139-139: Consistent token usage across test methodsGood consistency by using the same token source pattern across all test methods.
src/K8sOperator.NET/EventWatcher.cs (4)
56-81: Enhanced resilience with continuous processing loop and error handlingThe implementation of a continuous loop with comprehensive exception handling significantly improves the resilience of the event watcher. The while loop that checks both
_isRunningand cancellation status ensures that the watcher can restart after recoverable errors without manual intervention.The specific exception handling strategies are well chosen:
TaskCanceledException: Appropriately logs without restarting (expected cancellation)OperationCanceledException: Logs and restarts after a delay (transient issues)HttpOperationException: Logs the HTTP error details and restarts after a delay (network issues)The 5-second delay before retry helps prevent resource exhaustion during error conditions.
67-70: Appropriate handling of expected cancellationGood approach to handling
TaskCanceledExceptionby simply logging without attempting to restart, as this typically indicates an intentional cancellation.
71-75: Good recovery strategy for operation cancellationsAppropriate handling of
OperationCanceledExceptionwith a retry delay to handle transient issues.
76-80: Detailed error logging for HTTP exceptionsGood practice to include the HTTP response content in the error message for better diagnostics.
Ensures the event watcher recovers from exceptions and restarts, preventing service interruption.
Adds logging for watcher errors to improve debugging and monitoring.
Summary by CodeRabbit
New Features
Bug Fixes