Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return to using executor.wrap around Scheduler execution task #99

Merged
merged 3 commits into from
Aug 27, 2020

Conversation

bensheldon
Copy link
Owner

@bensheldon bensheldon commented Aug 26, 2020

@sj26 I'm seeing some behavior indicating that #97 is losing advisory locks. Would you be able to take a look?

It looks similar to what's described here: rails/rails#26956 (comment)

Off the top of my head, the most potentially-surprising outcome I'd anticipate is that executor.wrap inside with_connection would cause the connection to be returned early

...at least, that's my hypothesis given that the Advisory Locks are connection-based, implying that the connection taking the advisory lock is a different connection than the one trying to release it (the WARNING: you don't own a lock of type ExclusiveLock).

$ bin/rspec spec/integration/scheduler_spec.rb

Randomized with seed 533
.WARNING:  you don't own a lock of type ExclusiveLock
WARNING:  you don't own a lock of type ExclusiveLock
WARNING:  you don't own a lock of type ExclusiveLock
WARNING:  you don't own a lock of type ExclusiveLock
F

Failures:

  1) Schedule Integration when there are a large number of jobs pops items off of the queue and runs them
     Failure/Error: expect(RUN_JOBS.size).to eq number_of_jobs

       expected: 500
            got: 501

       (compared using ==)
     # ./spec/integration/scheduler_spec.rb:80:in `block (3 levels) in <top (required)>'
     # ./spec/support/reset_rails_queue_adapter.rb:4:in `block (2 levels) in <top (required)>'
     # ./spec/support/database_cleaner.rb:13:in `block (3 levels) in <top (required)>'
     # ./spec/support/database_cleaner.rb:12:in `block (2 levels) in <top (required)>'

Finished in 7.81 seconds (files took 3.96 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/integration/scheduler_spec.rb:52 # Schedule Integration when there are a large number of jobs pops items off of the queue and runs them

Randomized with seed 533

@sj26
Copy link
Contributor

sj26 commented Aug 26, 2020

Ah, hmm, perhaps the executor within activejob is checking-in the connection too early, now. Hmm. Perhaps we need to restore the executor around the perform block only, before it takes the advisory lock.

I'll take a look.

@bensheldon
Copy link
Owner Author

bensheldon commented Aug 26, 2020

@sj26 I found something interesting/unfortunate:

  • It seems like using with_connection doesn't appropriately retain the same connection leading to a situation where the advisory lock is owned by a different connection than the one that tries to unlock it.
  • Even when I go back to the old executor.wrap there is, occasionally, a record that is returned from the query that isn't advisory locked at all. This seems to happen consistently when the scheduler is running hot (1000 jobs), so fyi there may be some intersecting issues here.

For the latter issue of the query not reliably returning an advisory locked record, I can patch an additional guard clause. It's not ideal with an extra database query, but it's good until I can dig more into that.

I do feel confident that returning the executor.wrap addresses the former issue of connection retention.

I'm very fascinated by all of this and appreciate your efforts 🙌

@bensheldon bensheldon changed the title Demonstrate loss of advisory lock with with_connection Return to using executor.wrap around Scheduler execution task Aug 26, 2020
@sj26
Copy link
Contributor

sj26 commented Aug 26, 2020

Sorry, I ran out of time to look at this yesterday. But I think what you're proposing here is correct.

ActiveRecord installs query cache handling into the executor:

https://github.com/rails/rails/blob/7101489293ece071013417aeed2da3aa1b0927c9/activerecord/lib/active_record/railtie.rb#L244

It releases connections when the executor completes, regardless of whether the connection was already held before the executor starts:

https://github.com/rails/rails/blob/7101489293ece071013417aeed2da3aa1b0927c9/activerecord/lib/active_record/query_cache.rb#L38-L46

So by the time the ActiveJob has returned we have lost the connection which took the advisory lock.

Double-wrapping the executor is harmless, the second wrap (done within ActiveJob) will simply yield:

https://github.com/rails/rails/blob/7101489293ece071013417aeed2da3aa1b0927c9/activesupport/lib/active_support/execution_wrapper.rb#L84

I would push the executor wrap into the perform method rather than putting it in the future for clarity about what is being wrapped a why, but this should fix the issue. 👍

This shouldn't reintroduce the deadlock because these perform methods only run for the duration of a job. The issue was the notifier being run in an executor which never returns. As long as we don't re-wrap the notifier I think we should be good. 🙌

@bensheldon
Copy link
Owner Author

@sj26 I appreciate all of your help. Especially digging into what's happening here and all the improvements 🎉

I would push the executor wrap into the perform method rather than putting it in the future for clarity about what is being wrapped a why,

I have it in the Scheduler because it felt closer to the responsibility of the thread to set up a proper execution environment, rather than the Performer or Job-level code to be aware that it was running inside of a thread. That said, I'll admit it's perhaps un-YAGNI thinking.

I think it's becoming clear that the querying/locking implementation may need to change and that might drive my thinking and and the code further.

@bensheldon bensheldon merged commit 1218e8f into main Aug 27, 2020
@bensheldon bensheldon deleted the db_connection_leak branch August 27, 2020 03:22
@sj26
Copy link
Contributor

sj26 commented Aug 27, 2020

Nice, I tried this out locally with my async puma setup and code reload worked without deadlocks. 👍

@gadimbaylisahil
Copy link
Contributor

break unless good_job&.owns_advisory_lock?

This change broke my implementation in an application with
multi-tenancy enabled with Postgres Schemas. Using https://github.com/rails-on-services/apartment gem.

I will try to also look into why the advisory lock isn't owned. Or at least will try to create an example application with same setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants