Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Active-)Job errors not send (sentry-resque) #1754

Closed
Flixt opened this issue Mar 4, 2022 · 8 comments
Closed

(Active-)Job errors not send (sentry-resque) #1754

Flixt opened this issue Mar 4, 2022 · 8 comments

Comments

@Flixt
Copy link

Flixt commented Mar 4, 2022

Issue Description

Errors that happen within Jobs are not sent to Sentry when using sentry-rails and sentry-resque

The problem was encountered after upgrading our app to Rails 7 and Ruby 3.0.2

We upgraded:

sentry-ruby from '~> 4.6.5', to '~> 5.1.1'
sentry-rails from '~> 4.6.5', to '~> 5.1.1'

And we switched from our own fork of resque-sentry to sentry-resque', '~> 5.1.1'
We noticed after a while, errors from whithin our jobs are no longer reported to Sentry.

I tried to hunt down the bug and here is what I found so far. Some facts beforehand:

Reproduction Steps

I forked the repository and changed the Rails 7.0 example app of sentry-rails in a way to make it possible to reproduce the issue described above.
The diff of the fork can be found here (I did not open a PR as this does not contain a fix):

master...flowdgmbh:no-resque-events-example

  • Clone the fork
  • cd sentry-ruby/sentry-rails/examples/rails-7.0
  • bundle install
  • Change the dsn in sentry-rails/examples/rails-7.0/config/initializers/sentry.rb
  • RAILS_ENV=development COUNT=1 QUEUE=* bundle exec rake resque:workers
  • In another terminal schedule a job with bundle exec rails runner 'ErrorJob.perform_later'

Expected Behavior

The error is reported to sentry

Actual Behavior

The following happens for us when an error occours in the job (in the context of the separate Reqsue worker):

Result: No error was sent to Sentry

Ruby Version

3.1.0

SDK Version

5.1.1

Integration and Its Version

No response

Sentry Config

No response

@Flixt
Copy link
Author

Flixt commented Mar 4, 2022

When I patch the Sentry::Rails::ErrorSubscriber like this in config/initializers/sentry.rb

require 'sentry/rails/error_subscriber'

# Monkey patch for https://github.com/getsentry/sentry-ruby/issues/1754
module Sentry
  module Rails
    class ErrorSubscriber
      def report(error, handled:, severity:, context:)
        # Skip errors from Jobs
        return if context.is_a?(Hash) && context.key?(:job)

        Sentry::Rails.capture_exception(error, level: severity, contexts: {'rails.error' => context}, tags: {handled: handled})
      end
    end
  end
end

Everything works fine, as the sentry-resque integration gets to handle an uncaptured error and reports it to Sentry.

@st0012
Copy link
Collaborator

st0012 commented Mar 14, 2022

@Flixt I'm sorry for the trouble. And thanks for the amazingly detailed report.

I think there are 2 major causes of it:

  1. The double capturing after introducing ErrorSubscriber for Rails 7.0.
  2. ErrorSubscriber should still send the event through the SDK background worker, even without background: false hint.

There's an on going issue (#1731) for 1), which makes me think if we should have that support opt-out by default. Do you think the contextual data is rich enough for events sent through the ErrorSubscriber?

And I'll also investigate 2) because after #1617, all events should still be sent with/without background: false hint.

@st0012
Copy link
Collaborator

st0012 commented Mar 14, 2022

@Flixt Ok, I think 2) is a Resque issue. Currently, its workers exit with exit!, which skips all at_exit callbacks. This has caused many issues and the latest report is resque/resque#1664.

I'm not sure if the SDK can do anything about it without potentially mess things up further. But if you set RUN_AT_EXIT_HOOKS=1, you should be able to get rid of the workaround. Here's the result from your example:

❯ RUN_AT_EXIT_HOOKS=1 RAILS_ENV=development COUNT=1 QUEUE=* bundle exec rake resque:workers
Calling `DidYouMean::SPELL_CHECKERS.merge!(error_name => spell_checker)' has been deprecated. Please call `DidYouMean.correct_error(error_name, spell_checker)' instead.
Calling `DidYouMean::SPELL_CHECKERS.merge!(error_name => spell_checker)' has been deprecated. Please call `DidYouMean.correct_error(error_name, spell_checker)' instead.
DEBUGGER: Attaching after process 89342 fork to child process 89422
4080 --- Start handling by integration? true
4080 --- Sentry::Rails: {:level=>:error, :contexts=>{"rails.error"=>{:job=>#<ErrorJob:0x00000001117aa5a0 @arguments=[], @job_id="59884352-cf81-4514-832a-c5a245c4fbf0", @queue_name="default", @priority=nil, @executions=1, @exception_executions={}, @timezone="UTC", @provider_job_id=nil, @serialized_arguments=nil, @locale="en", @enqueued_at="2022-03-14T12:16:00Z">}}, :tags=>{:handled=>false}, :hint=>{:integration=>"rails"}}
4080 --- Mark even as captured
4080 --- Handle by Sentry::Resque::SentryReporter
4080 --- Sentry::Resque: {:hint=>{:background=>false, :integration=>"resque"}}
16360 --- Sending HTTP request
16380 --- Sending HTTP request
16380 --- HTTP response code is 200
16360 --- HTTP response code is 200

cc @sl0thentr0py

@Flixt
Copy link
Author

Flixt commented Mar 15, 2022

@st0012 thank you for your feedback.

  1. I compared the two errors (one captured by the ErrorSubscriber and one captured by the resque integration) and could barely see a difference in the contextual data (although the errors were the made up errors from the example application).

  2. Good catch, thank you. Starting resque with RUN_AT_EXIT_HOOKS=1 leads to both errors being reported and
    the behavior probably being the same as described in Duplicate reporting for Sidekiq errors #1731

Which makes me think if we should have that support opt-out by default

I think users should actively opt in to let the ErrorSubscriber capture their exceptions. At least for us it was a surprise
and took us some time to figure out what was going on. Probably Rails is more to blame here because the new error reporting interface is not as prominent as other changes on the upgrade path).
If we had known that Sentry auto opts in to the Rails ErrorSubscriber we probably would have removed sentry-resque all together. Ok fair, this would have led to no errors from resque being reported to sentry anymore. I don't know about other apps but at least in our app (which really has a good test coverage) we do not test/mock error reporting to sentry as we don't test functionality of gems in our app specific tests.
To make a long story short: By using Rails new errors subscriber (currently) things can break in certain circumstances. And I think for some users (like us) they break in silently.

@st0012
Copy link
Collaborator

st0012 commented Mar 18, 2022

@Flixt Thanks for all the feedback.

I will make the error subscriber registration an opt-in thing, as it's been causing too many issues for users. I'm sorry about that.

And speaking of testing, do you think if the SDK packs with test helpers (so you don't actually send the events), you'll be more keen to write error reporting tests? As described here: #1680

The idea is that, error reporting is already essential for most of the apps. And having such an important component uncovered by tests could be risky, from losing contextual data to incorrectly using an interface.

@st0012
Copy link
Collaborator

st0012 commented Mar 18, 2022

FYI, I'll change the error reporter integration in 5.3.0. More details: rails/rails#43625 (comment)

@Flixt
Copy link
Author

Flixt commented Mar 21, 2022

I will make the error subscriber registration an opt-in thing, as it's been causing too many issues for users. I'm sorry about that.

No worries! Glad we figured that out :)

And speaking of testing, do you think if the SDK packs with test helpers (so you don't actually send the events), you'll be more keen to write error reporting tests? As described here: #1680
The idea is that, error reporting is already essential for most of the apps. And having such an important component uncovered by tests could be risky, from losing contextual data to incorrectly using an interface

Not having to mock the requests myself would be a big benefit that would definitely make me more willing to ensure things get reported in the future.

Feel free to change the state of this issue as you want since I consider it "fixed". At least for us the monkey patch will work until 5.3.0 is released. Then we will probably just opt out of the resque integration and use the Rails ErrorReporter.

@st0012
Copy link
Collaborator

st0012 commented Mar 21, 2022

Close via #1771

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants