Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Couldn't find GoodJob::BatchRecord" error #1387

Closed
jesseduffield opened this issue Jul 4, 2024 · 12 comments
Closed

"Couldn't find GoodJob::BatchRecord" error #1387

jesseduffield opened this issue Jul 4, 2024 · 12 comments

Comments

@jesseduffield
Copy link

I'm getting this error on some of my batch callback jobs:

{"_aj_globalid"=>"gid://subble-api/GoodJob::Batch/e4b54d15-a2a4-4778-bf3c-5ce1de3f366f"},
{"event"=>{"value"=>"finish",
"_aj_serialized"=>"ActiveJob::Serializers::SymbolSerializer"},
"_aj_symbol_keys"=>["event"]}

Error: ActiveJob::DeserializationError: Error while trying to deserialize arguments: Couldn't find GoodJob::BatchRecord with 'id'=e4b54d15-a2a4-4778-bf3c-5ce1de3f366f

/usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/core.rb:253:in `find'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/batch.rb:67:in `find'   /usr/local/bundle/ruby/3.3.0/gems/globalid-1.2.1/lib/global_id/locator.rb:162:in `locate'   /usr/local/bundle/ruby/3.3.0/gems/globalid-1.2.1/lib/global_id/locator.rb:210:in `block in locate'   /usr/local/bundle/ruby/3.3.0/gems/globalid-1.2.1/lib/global_id/locator.rb:222:in `unscoped'   /usr/local/bundle/ruby/3.3.0/gems/globalid-1.2.1/lib/global_id/locator.rb:210:in `locate'   /usr/local/bundle/ruby/3.3.0/gems/globalid-1.2.1/lib/global_id/locator.rb:31:in `locate'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/arguments.rb:145:in `deserialize_global_id'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/arguments.rb:129:in `deserialize_argument'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/arguments.rb:43:in `block in deserialize'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/arguments.rb:43:in `map'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/arguments.rb:43:in `deserialize'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/core.rb:210:in `deserialize_arguments'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/core.rb:200:in `deserialize_arguments_if_needed'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/execution.rb:50:in `perform_now'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/instrumentation.rb:26:in `block in perform_now'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/railties/job_runtime.rb:13:in `block in instrument'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/instrumentation.rb:40:in `block in instrument'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications.rb:206:in `block in instrument'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications/instrumenter.rb:58:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications.rb:206:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/instrumentation.rb:39:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/railties/job_runtime.rb:11:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/instrumentation.rb:26:in `perform_now'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/logging.rb:18:in `block in perform_now'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/tagged_logging.rb:135:in `block in tagged'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/tagged_logging.rb:39:in `tagged'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/tagged_logging.rb:135:in `tagged'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/broadcast_logger.rb:240:in `method_missing'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/logging.rb:25:in `tag_logger'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/logging.rb:18:in `perform_now'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/execution.rb:30:in `block in execute'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:121:in `block in run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/railtie.rb:67:in `block (4 levels) in <class:Railtie>'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/reloader.rb:77:in `block in wrap'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/execution_wrapper.rb:88:in `wrap'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/reloader.rb:74:in `wrap'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/railtie.rb:66:in `block (3 levels) in <class:Railtie>'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:130:in `instance_exec'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:130:in `block in run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:141:in `run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/activejob-7.1.3.4/lib/active_job/execution.rb:28:in `execute'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:414:in `block (3 levels) in perform'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications.rb:206:in `block in instrument'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications/instrumenter.rb:58:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/notifications.rb:206:in `instrument'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:413:in `block (2 levels) in perform'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/lib/good_job/current_thread.rb:113:in `within'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:376:in `block in perform'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:121:in `block in run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/batch.rb:78:in `within_thread'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:576:in `reset_batch_values'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:130:in `block in run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/callbacks.rb:141:in `run_callbacks'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:371:in `perform'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:273:in `block in perform_with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/concerns/good_job/advisory_lockable.rb:172:in `block in with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:929:in `_scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:467:in `scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/scoping/default.rb:51:in `unscoped'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/concerns/good_job/advisory_lockable.rb:172:in `with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation/delegation.rb:79:in `block in with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:929:in `_scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:467:in `scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation/delegation.rb:79:in `with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/app/models/good_job/execution.rb:269:in `perform_with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation/delegation.rb:79:in `block in perform_with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:929:in `_scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation.rb:467:in `scoping'   /usr/local/bundle/ruby/3.3.0/gems/activerecord-7.1.3.4/lib/active_record/relation/delegation.rb:79:in `perform_with_advisory_lock'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/lib/good_job/job_performer.rb:33:in `next'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/lib/good_job/scheduler.rb:276:in `block (2 levels) in create_task'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/reloader.rb:77:in `block in wrap'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/execution_wrapper.rb:92:in `wrap'   /usr/local/bundle/ruby/3.3.0/gems/activesupport-7.1.3.4/lib/active_support/reloader.rb:74:in `wrap'   /usr/local/bundle/ruby/3.3.0/gems/good_job-3.28.0/lib/good_job/scheduler.rb:275:in `block in create_task'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `block in synchronize'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:22:in `execute'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/ivar.rb:170:in `safe_execute'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/scheduled_task.rb:298:in `process_task'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/timer_set.rb:166:in `block (2 levels) in process_tasks'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:359:in `run_task'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:350:in `block (3 levels) in create_worker'   <internal:kernel>:187:in `loop'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `block (2 levels) in create_worker'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `catch'   /usr/local/bundle/ruby/3.3.0/gems/concurrent-ruby-1.3.1/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `block in create_worker'``

This happened when I was having some DB outage issues, not sure if it's related.

@bensheldon
Copy link
Owner

That's strange! That looks like a Batch callback job tried to run, and it was unable to load the associated Batch record out of the database.

Are you still experiencing the error / did that job ever successfully complete?

It's possible this was related to your DB outage. Normally though the cause of that exception would look something like (example):

ActiveJob::DeserializationError: Error while trying to deserialize arguments: PG::ConnectionBad: PQsocket() can't get socket descriptor`

@jesseduffield
Copy link
Author

I'm no longer experiencing the error, so the issue may indeed have been specific to the outage. I'm happy to close this and report back if it happens again.

@jesseduffield
Copy link
Author

@bensheldon I've just had this error happen again: this time without any outage or DB issues. I'm wondering if the cause could be due to a job in the batch retrying for a while before finally succeeding? Perhaps the batch is cleaned up in the meantime, so that when the retrying job eventually succeeds, the batch is gone?

@bensheldon
Copy link
Owner

Hmm. That's definitely possible if the batch callback job doesn't successfully complete before the batch is deleted. The default time to preserve jobs (the same setting is used for batches) is 2 weeks.

@jesseduffield
Copy link
Author

Ah, I think I see the problem now: I've got the following set:

config.good_job.cleanup_preserved_jobs_before_seconds_ago = 10.minutes

So a batch is considered complete once its constituent jobs finish, but not before any callback jobs finish. That means if a callback takes longer than 10 minutes to run (either due to being queued for a while or retrying) then the batch will have been deleted before the callback job can run, and then when the callback job runs, it errors because it can't find the batch.

I propose that batches should not be deleted until their callback jobs have completed, or if that's too complicated, that batches should have their own preservation config. What do you think about that @bensheldon ?

@bensheldon
Copy link
Owner

I propose that batches should not be deleted until their callback jobs have completed, or if that's too complicated, that batches should have their own preservation config. What do you think about that @bensheldon ?

I lean towards adding additional checks to ensure that callback jobs have completed. I'm trying to remember why that wasn't part of the design for batches, because I'm pretty sure I ignored them intentionally. Though maybe that was simply because I didn't want to run the callbacks in sequence. Or maybe I just was waiting for someone to notice and implement it just-in-time 😄

  • I think it should be achievable to add a new state/column to Batches which is like callback_jobs_finished_at.
  • Ensure that the value is reset when Batch#enqueue is called
  • Extend BatchRecord#_continue_discard_or_finish job or make something similar (called every time a job finishes that will check the Job#batch_callback_id column and if it is present, take a lock on the Batch and count up all the callback jobs and check if they're finished.
  • Update the deletion logic to check the BatchRecord#callback_jobs_finished_at column instead of the finished_at column when deciding to delete batch records.

@jesseduffield
Copy link
Author

That sounds reasonable to me

@bensheldon
Copy link
Owner

bensheldon commented Jul 31, 2024

@jesseduffield do you want to attempt it or should I add to me todo list? 😉

@jesseduffield
Copy link
Author

I'm afraid I won't have time to work on this any time soon so feel free to chuck it in the todo list :)

bensheldon added a commit that referenced this issue Aug 1, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Aug 1, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Aug 2, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Aug 2, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Aug 2, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
@bensheldon
Copy link
Owner

bensheldon commented Aug 2, 2024

fyi, having almost finished implementing this, I think I'm instead going to redefine finished_at to mean the callbacks have fully finished, and instead add a new column that is callbacks_at. Nope, not doing this, see next comment.

@bensheldon
Copy link
Owner

I think I'm instead going to redefine finished_at to mean the callbacks have fully finished, and instead add a new column that is callbacks_at.

Nope, I don't want to redefine that because it could effect callback jobs that are asking "am I running in a batch of jobs, or as a callback to a batch of jobs. I want to maintain the semantics that a batch's discarded/succeeded/finished states refer to the batch of jobs , of which the callback jobs are not inclusive. Callbacks happen after the batch, not part of the batch. Fhew, took some thinking about that.

bensheldon added a commit that referenced this issue Aug 8, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Aug 9, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
@bensheldon
Copy link
Owner

Nope nope, the problem of not redefining finished_at is that it makes it difficult to switch over the behavior because it's unclear if batches are ready for deletion because their callbacks haven't finished, or because they pre-existed the data migration 🤔

bensheldon added a commit that referenced this issue Aug 29, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
bensheldon added a commit that referenced this issue Sep 14, 2024
Connects to #1387

Add logic to delete batches only after their callback jobs have completed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants