Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error logging to allow knowledge of what the failure was before retrying #365

Merged
merged 2 commits into from
May 16, 2017

Conversation

rwadstein
Copy link
Contributor

In the event of relatively long timeouts between retries of message processing, it would be useful to have preemptive knowledge of such errors. This will allow any external dependency failures to be known about and remedied before the next retry.

retry_intervals = worker.class.get_shoryuken_options['retry_intervals']

if retry_intervals.nil? || !handle_failure(sqs_msg, started_at, retry_intervals)
# Re-raise the exception if the job is not going to be exponential backoff retried.
# This allows custom middleware (like exception notifiers) to be aware of the unhandled failure.
raise
end

# since we didn't raise, lets log the backtrace for debugging purposes.
logger.debug { ex.backtrace * "\n " }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwadstein WDYT about:

logger.debug ex.backtrace.join("\n") unless ex.backtrace.nil?

Just in case backtrace is new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get into the rescue block without an error class being raised? I don't mind putting in the check, but just curious if the scenario is legitimate or not.

@@ -12,14 +12,19 @@ def call(worker, queue, sqs_msg, body)

started_at = Time.now
yield
rescue
rescue => ex
logger.info { "Message #{sqs_msg.message_id} will attempt retry due to error: #{ex.message}" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwadstein I'm wondering if that should be below line 24, since 23 can raise up the error, WDYT?

Should this be also debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I'll make that change. I had that thought with the debug statement below, but not the informational one. No need to see the same error twice in the logs.
I think debug already gets potentially noisy as it is. I think the error is very useful. Maybe warn?

@phstc
Copy link
Collaborator

phstc commented May 15, 2017

In the event of relatively long timeouts between retries of message processing, it would be useful to have preemptive knowledge of such errors.

@rwadstein this makes sense. But for such cases, I also recommend configuring a DL queue, we don't retry your messages indefinitely.

@rwadstein
Copy link
Contributor Author

@phstc I do agree with setting up a dlq 100%. We have timeouts that potentially go on for days before getting into the dlq which is why this is useful

@phstc
Copy link
Collaborator

phstc commented May 16, 2017

@rwadstein thinking out loud... maybe it should always raise up, no matter if auto retrying or not. We could do the warn as a minor release, and raising up as a major with breaking changes.

I'm suggesting raising up because if it failed, it should raise up the error, it's an error, retrying or not is another thing. For example, if someone is using a middleware for tracking the error externally, would be cool to see those errors in there, but as it does not raise up, if you don't configure the middleware order properly, it won't get tracked, since the exponential backoff middleware won't raise up.

WDYT?

@rwadstein
Copy link
Contributor Author

rwadstein commented May 16, 2017

I definitely agree and that was my initial thought, but didn't go down that route because of the breaking changes and significant refactoring. Some of our other workers based on other async tech uses that concept.

@phstc phstc merged commit 0bb7454 into ruby-shoryuken:master May 16, 2017
@phstc
Copy link
Collaborator

phstc commented May 16, 2017

@rwadstein cool, I will think on that for a major release.

Just merged your fix into master! 🍻

Thanks

@phstc
Copy link
Collaborator

phstc commented May 18, 2017

Hi @rwadstein

I've just released a new version 3.0.7 including your changes.

@rwadstein
Copy link
Contributor Author

@phstc perfect. Thank you. And not a moment too late. I am in the process of releasing a new version of our shoryuken based worker docker image and getting that update would really help. We already got hit by some errors like that and had to wait over a day to find out what the error was.

@rwadstein rwadstein deleted the retry_warning branch May 18, 2017 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants