Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to discard failed jobs which crashed with ActiveJob::DeserializationError #770

Closed
nickcampbell18 opened this issue Dec 12, 2022 · 2 comments · Fixed by #771
Closed

Comments

@nickcampbell18
Copy link
Contributor

Hi team, not 100% sure whether this issue is caused by Rails or GoodJob, but here goes!

Steps to reproduce

  1. Enqueue a job using a Global ID, then delete the record it pointed to. The job cannot run now because it would raise an ActiveJob::DeserializationError when a GoodJob executor picks it up
  2. User visits the GoodJob dashboard, finds the job and clicks "Discard" (this could also happen if there was a failed execution first, but it's irrelevant to trigger this problem)

Expected behaviour

User can discard the job, even though the arguments cannot deserialize

Actual behaviour

The GoodJob web UI crashes with an exception, user gets a HTTP 500 error page. It looks like calling execution.active_job is what triggers Rails to attempt to deserialize the arguments, and crash. Here's the relevant parts of the backtrace:

lib/active_record/relation/finder_methods.rb:381 raise_record_not_found_exception!
...
lib/active_record/relation/finder_methods.rb:69 find
lib/active_record/querying.rb:22 find
lib/active_record/core.rb:271 find
...
lib/global_id/locator.rb:17 locate
lib/active_job/arguments.rb:148 deserialize_global_id
...
lib/active_job/arguments.rb:43 deserialize
lib/active_job/core.rb:190 deserialize_arguments
lib/active_job/core.rb:180 deserialize_arguments_if_needed
app/models/good_job/execution.rb:321 block in active_job
app/models/good_job/execution.rb:320 active_job
app/models/good_job/job.rb:209 block in discard_job
app/models/good_job/lockable.rb:309 with_advisory_lock
app/models/good_job/job.rb:207 discard_job
app/controllers/good_job/jobs_controller.rb:64 discard

I'm aware this is possibly a niche issue. Rails generates the commented-out code discard_on ActiveJob::DeserializationError in ApplicationRecord, to encourage you to consider this by default, but in my application we had never enabled it globally and I would guess others also haven't, so I think there's good cause for fixing it.

I could see an argument for proposing a fix to Rails itself, maybe for a way to load the job without deserializing the arguments (e.g. for instrumentation purposes, as we use it here). However, from reading the code, it looks like we currently generate a synthetic error (DiscardJobError) and then mark that as the error of the job. Maybe we could catch the deserialization error specifically, and mark it as the ancestor of the discard error so the history is preserved? I don't know whether there are other core exceptions expected simply by loading the job/arguments (I haven't seen others before) so it feels a little weird to have this tight dependency on a specific exception... but otherwise I can't think of an elegant fix to this issue.

I'd be happy to submit a patch if there's a good way to go here 😄

Thank you maintainers! Love this gem ❤️

@bensheldon
Copy link
Owner

@nickcampbell18 thank you for the bug report. This is a really interesting/exciting find and I appreciate you tracing it down through GoodJob! 🙌🏻

I think that GoodJob should cover this situation. GoodJob's Dashboard UI tries to perform the various actions through ActiveJob (or simulate it as closely as possible). In this case that the ActiveJob object is invalid (or at least un-deserealizable), I think GoodJob should recognize/rescue from that situation and be able to successfully perform the discard action.

From quickly looking into the code, it seems like the ActiveJob object is only deserialized to trigger instrumentation within GoodJob::Job#discard_job:

if active_job.respond_to?(:instrument)
active_job.send :instrument, :discard, error: job_error, &update_execution

And even looking at the construction of the ActiveJob object, it seems like it's maybe doing too much all the time by always deserializing arguments:

def active_job
ActiveJob::Base.deserialize(active_job_data).tap do |aj|
aj.send(:deserialize_arguments_if_needed)
end
end

The only two places that GoodJob seems to try to deserialize an ActiveJob object are for retry_job and discard_job. Retry seems like it would require arguments being present to successfully reenqueue the job, whereas discard wouldn't.

I'm thinking that maybe the #active_job could be updated to something like

def active_job(safe_deserialize: false)
  ActiveJob::Base.deserialize(active_job_data).tap do |aj|
    aj.send(:deserialize_arguments_if_needed)
  rescue ActiveJob::DeserializationError
    raise unless safe_deserialize
  end
end

and discard_job could set that parameter to true.

Would you want to try making and testing a change like that (I'm also open to other ideas too)?

@nickcampbell18
Copy link
Contributor Author

Love it. Will send a patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants