Error logging to allow knowledge of what the failure was before retrying #365

rwadstein · 2017-05-15T20:48:13Z

In the event of relatively long timeouts between retries of message processing, it would be useful to have preemptive knowledge of such errors. This will allow any external dependency failures to be known about and remedied before the next retry.

phstc · 2017-05-15T21:03:38Z

lib/shoryuken/middleware/server/exponential_backoff_retry.rb

          retry_intervals = worker.class.get_shoryuken_options['retry_intervals']

          if retry_intervals.nil? || !handle_failure(sqs_msg, started_at, retry_intervals)
            # Re-raise the exception if the job is not going to be exponential backoff retried.
            # This allows custom middleware (like exception notifiers) to be aware of the unhandled failure.
            raise
          end
+
+          # since we didn't raise, lets log the backtrace for debugging purposes.
+          logger.debug { ex.backtrace * "\n  " }


@rwadstein WDYT about:

logger.debug ex.backtrace.join("\n") unless ex.backtrace.nil?

Just in case backtrace is new?

Is it possible to get into the rescue block without an error class being raised? I don't mind putting in the check, but just curious if the scenario is legitimate or not.

phstc · 2017-05-15T21:04:28Z

lib/shoryuken/middleware/server/exponential_backoff_retry.rb

@@ -12,14 +12,19 @@ def call(worker, queue, sqs_msg, body)

          started_at = Time.now
          yield
-        rescue
+        rescue => ex
+          logger.info { "Message #{sqs_msg.message_id} will attempt retry due to error: #{ex.message}" }


@rwadstein I'm wondering if that should be below line 24, since 23 can raise up the error, WDYT?

Should this be also debug?

good point. I'll make that change. I had that thought with the debug statement below, but not the informational one. No need to see the same error twice in the logs.
I think debug already gets potentially noisy as it is. I think the error is very useful. Maybe warn?

phstc · 2017-05-15T21:06:57Z

In the event of relatively long timeouts between retries of message processing, it would be useful to have preemptive knowledge of such errors.

@rwadstein this makes sense. But for such cases, I also recommend configuring a DL queue, we don't retry your messages indefinitely.

rwadstein · 2017-05-16T12:50:14Z

@phstc I do agree with setting up a dlq 100%. We have timeouts that potentially go on for days before getting into the dlq which is why this is useful

phstc · 2017-05-16T14:23:13Z

@rwadstein thinking out loud... maybe it should always raise up, no matter if auto retrying or not. We could do the warn as a minor release, and raising up as a major with breaking changes.

I'm suggesting raising up because if it failed, it should raise up the error, it's an error, retrying or not is another thing. For example, if someone is using a middleware for tracking the error externally, would be cool to see those errors in there, but as it does not raise up, if you don't configure the middleware order properly, it won't get tracked, since the exponential backoff middleware won't raise up.

WDYT?

rwadstein · 2017-05-16T14:57:14Z

I definitely agree and that was my initial thought, but didn't go down that route because of the breaking changes and significant refactoring. Some of our other workers based on other async tech uses that concept.

phstc · 2017-05-16T16:35:54Z

@rwadstein cool, I will think on that for a major release.

Just merged your fix into master! 🍻

Thanks

phstc · 2017-05-18T14:48:23Z

Hi @rwadstein

I've just released a new version 3.0.7 including your changes.

rwadstein · 2017-05-18T14:50:51Z

@phstc perfect. Thank you. And not a moment too late. I am in the process of releasing a new version of our shoryuken based worker docker image and getting that update would really help. We already got hit by some errors like that and had to wait over a day to find out what the error was.

error logging to allow knowledge of what the failure was before retrying

99608c3

rwadstein force-pushed the retry_warning branch from f92a858 to 99608c3 Compare May 15, 2017 20:50

phstc requested changes May 15, 2017

View reviewed changes

move warning to only appear if raise does not occur.

e631a71

phstc merged commit 0bb7454 into ruby-shoryuken:master May 16, 2017

rwadstein deleted the retry_warning branch May 18, 2017 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error logging to allow knowledge of what the failure was before retrying #365

Error logging to allow knowledge of what the failure was before retrying #365

rwadstein commented May 15, 2017

phstc May 15, 2017

rwadstein May 16, 2017

phstc May 15, 2017

rwadstein May 16, 2017

phstc commented May 15, 2017

rwadstein commented May 16, 2017

phstc commented May 16, 2017

rwadstein commented May 16, 2017 •

edited

Loading

phstc commented May 16, 2017

phstc commented May 18, 2017

rwadstein commented May 18, 2017

Error logging to allow knowledge of what the failure was before retrying #365

Error logging to allow knowledge of what the failure was before retrying #365

Conversation

rwadstein commented May 15, 2017

phstc May 15, 2017

Choose a reason for hiding this comment

rwadstein May 16, 2017

Choose a reason for hiding this comment

phstc May 15, 2017

Choose a reason for hiding this comment

rwadstein May 16, 2017

Choose a reason for hiding this comment

phstc commented May 15, 2017

rwadstein commented May 16, 2017

phstc commented May 16, 2017

rwadstein commented May 16, 2017 • edited Loading

phstc commented May 16, 2017

phstc commented May 18, 2017

rwadstein commented May 18, 2017

rwadstein commented May 16, 2017 •

edited

Loading