-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error logging to allow knowledge of what the failure was before retrying #365
Conversation
retry_intervals = worker.class.get_shoryuken_options['retry_intervals'] | ||
|
||
if retry_intervals.nil? || !handle_failure(sqs_msg, started_at, retry_intervals) | ||
# Re-raise the exception if the job is not going to be exponential backoff retried. | ||
# This allows custom middleware (like exception notifiers) to be aware of the unhandled failure. | ||
raise | ||
end | ||
|
||
# since we didn't raise, lets log the backtrace for debugging purposes. | ||
logger.debug { ex.backtrace * "\n " } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwadstein WDYT about:
logger.debug ex.backtrace.join("\n") unless ex.backtrace.nil?
Just in case backtrace
is new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get into the rescue
block without an error class being raised? I don't mind putting in the check, but just curious if the scenario is legitimate or not.
@@ -12,14 +12,19 @@ def call(worker, queue, sqs_msg, body) | |||
|
|||
started_at = Time.now | |||
yield | |||
rescue | |||
rescue => ex | |||
logger.info { "Message #{sqs_msg.message_id} will attempt retry due to error: #{ex.message}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwadstein I'm wondering if that should be below line 24, since 23 can raise
up the error, WDYT?
Should this be also debug
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I'll make that change. I had that thought with the debug statement below, but not the informational one. No need to see the same error twice in the logs.
I think debug already gets potentially noisy as it is. I think the error is very useful. Maybe warn
?
@rwadstein this makes sense. But for such cases, I also recommend configuring a DL queue, we don't retry your messages indefinitely. |
@phstc I do agree with setting up a dlq 100%. We have timeouts that potentially go on for days before getting into the dlq which is why this is useful |
@rwadstein thinking out loud... maybe it should always raise up, no matter if auto retrying or not. We could do the I'm suggesting raising up because if it failed, it should raise up the error, it's an WDYT? |
I definitely agree and that was my initial thought, but didn't go down that route because of the breaking changes and significant refactoring. Some of our other workers based on other async tech uses that concept. |
@rwadstein cool, I will think on that for a major release. Just merged your fix into master! 🍻 Thanks |
Hi @rwadstein I've just released a new version 3.0.7 including your changes. |
@phstc perfect. Thank you. And not a moment too late. I am in the process of releasing a new version of our shoryuken based worker docker image and getting that update would really help. We already got hit by some errors like that and had to wait over a day to find out what the error was. |
In the event of relatively long timeouts between retries of message processing, it would be useful to have preemptive knowledge of such errors. This will allow any external dependency failures to be known about and remedied before the next retry.