Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs are not processed in multi schema setup. Apartment + GoodJob ( post 1.1.2 ) #117

Closed
gadimbaylisahil opened this issue Aug 31, 2020 · 13 comments

Comments

@gadimbaylisahil
Copy link
Contributor

gadimbaylisahil commented Aug 31, 2020

Hi, I have an application with multi tenant architecture using PostgreSQL Schemas with the help of Apartment gem.

Prior to 1.1.2(without Listen/Notify), Good Job processed this setup just fine.

After 1.1.2, queried jobs does not own advisory_lock. job.owns_advisory_lock returns false.

I have limited knowledge in this area but tried to debug it for a few days without luck. However, I see that there is TODO
comment in Job.rb which is interesting.

# TODO: Determine why some records are fetched without an advisory lock at all

Can you please provide info on what is happening here or are you still in dark clouds as well regarding this?

I have created an example application here with exact same setup.

Any direction is appreciated, I can debug more 👍

@bensheldon
Copy link
Owner

@gadimbaylisahil Yikes! 😰 Thanks for flagging this! I've been curious about multi-database setups. And thank you for all your contributions too 🙏

To point you in the directions I'm thinking:

  • The underlying SQL query that does the advisory locking has a problem. I added the guard clause you saw because I was seeing occasionally, when under heavy load, jobs would be fetched by the query that were not locked. It was fairly rare which is why I felt ok adding a guard clause and flagging it for later:
    unfinished.priority_ordered.only_scheduled.limit(1).with_advisory_lock do |good_jobs|
    good_job = good_jobs.first
    # TODO: Determine why some records are fetched without an advisory lock at all
    break unless good_job&.owns_advisory_lock?
  • Database connections are being lost/swapped/leaked. I spent a lot of time with that in Return to using executor.wrap around Scheduler execution task #99
  • GoodJob doesn't currently work with the Apartment Gem and the guard clause has made that obvious.

I'll dig into this too. Thank you for the example application.

@bensheldon
Copy link
Owner

Just to keep you in the loop with my debugging. I'm seeing a SQL error when running this:

irb(main):008:0> GoodJob::Job.all.advisory_locked
    GoodJob::Job Load (2.1ms) SELECT "public"."good_jobs".* FROM "public"."good_jobs" LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory'
  AND pg_locks.objsubid = 1
  AND pg_locks.classid = ('x'||substr(md5('public.good_jobs' || "public.good_jobs"."id"::text), 1, 16))::bit(32)::int
  AND pg_locks.objid = (('x'||substr(md5('public.good_jobs' || "public.good_jobs"."id"::text), 1, 16))::bit(64) << 32)::bit(32)::int WHERE "pg_locks"."locktype" IS NOT NULL LIMIT $1
Traceback (most recent call last):
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR:  missing FROM-clause entry for table "public.good_jobs")
LINE 3: ....classid = ('x'||substr(md5('public.good_jobs' || "public.go...

@bensheldon
Copy link
Owner

I think the issue so far is that this line is too naive when constructing valid sql from the table name and fails when the table name contains a schema:

scope :joins_advisory_locks, (lambda do
join_sql = <<~SQL
LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory'
AND pg_locks.objsubid = 1
AND pg_locks.classid = ('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(32)::int
AND pg_locks.objid = (('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(64) << 32)::bit(32)::int
SQL
joins(sanitize_sql_for_conditions([join_sql, { table_name: table_name }]))
end)

e.g. this line "#{table_name}"."#{primary_key}"::text should process the table_name more.

@gadimbaylisahil
Copy link
Contributor Author

gadimbaylisahil commented Aug 31, 2020

I think the issue so far is that this line is too naive when constructing valid sql from the table name and fails when the table name contains a schema:

scope :joins_advisory_locks, (lambda do
join_sql = <<~SQL
LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory'
AND pg_locks.objsubid = 1
AND pg_locks.classid = ('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(32)::int
AND pg_locks.objid = (('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(64) << 32)::bit(32)::int
SQL
joins(sanitize_sql_for_conditions([join_sql, { table_name: table_name }]))
end)

e.g. this line "#{table_name}"."#{primary_key}"::text should process the table_name more.

Nice find! Definitely one step closer.

By saying should process more what do you exactly mean? You mean that it should process it better?

@bensheldon
Copy link
Owner

By saying should process more what do you exactly mean? You mean that it should process it better?

Yes 😄

Currently it assumes that table_name is a simple string (e.g. good_jobs) whereas it looks like under Apartment it's public.good_jobs. It could be split on the period and then wrapped with quotes and reconstructed. But I also wonder if there is a more ActiveRecord/Arel way to do this.

@gadimbaylisahil
Copy link
Contributor Author

gadimbaylisahil commented Aug 31, 2020

Awesome, I will try to work on it today/tomorrow

On a side note:

In multi tenancy setup with Apartment it's advised to keep the jobs table only queried from the main 'public' schema. So regardless of to which schema we are checked it, it's always queried from and saved to public.

Config option in apartment.rb

  # Add any models that you do not want to be multi-tenanted, but remain in the global (public) namespace.
  # A typical example would be a Customer or Tenant model that stores each Tenant's information.
  #
  config.excluded_models = %w{ GoodJob::Job }

This eases processing of jobs as alternative is to checking out to a different schema and querying jobs. As jobs contain tenant name in their serialized parameters, during processing it's switched to that tenant by monkey patching ActiveJob.

class ActiveJob::Base
  class << self
    def execute(job_data)
      Apartment::Tenant.switch(job_data['tenant']) do
        super
      end
    end
  end

  def serialize
    super.merge('tenant' => Apartment::Tenant.current)
  end
end

This works fine with good_job ^.

I am wondering if I should apply the same to PgLocks table and keep excluded_models as well. However, it may not be related at all.

@bensheldon
Copy link
Owner

From staring at this more, it looks like the GoodJob::PgLock model is completely unused in the implementation. The only reference to pg_locks is directly in the sql of

scope :joins_advisory_locks, (lambda do
join_sql = <<~SQL
LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory'
AND pg_locks.objsubid = 1
AND pg_locks.classid = ('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(32)::int
AND pg_locks.objid = (('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(64) << 32)::bit(32)::int
SQL
joins(sanitize_sql_for_conditions([join_sql, { table_name: table_name }]))
end)

...so the entire pg_locks.rb file could be deleted; I think I was only using it for debugging.

I want to get you unblocked as expediently as possible, so don't let this block: looking at the SQL, I think this also may be incompatible with Rails' built-in multidatabase support.

@gadimbaylisahil
Copy link
Contributor Author

@bensheldon After preprocessing the table_name to get rid of .public my jobs are processing just fine now. I can open a PR and we can discuss there if we can go with an alternative way rather than split('.').last

I haven't used the multiple database feature of Rails, I am guessing issue will be related to having different connections? But it seems rather separate topic than the one in this ticket.

@gadimbaylisahil
Copy link
Contributor Author

gadimbaylisahil commented Aug 31, 2020

From staring at this more, it looks like the GoodJob::PgLock model is completely unused in the implementation. The only reference to pg_locks is directly in the sql of

scope :joins_advisory_locks, (lambda do
join_sql = <<~SQL
LEFT JOIN pg_locks ON pg_locks.locktype = 'advisory'
AND pg_locks.objsubid = 1
AND pg_locks.classid = ('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(32)::int
AND pg_locks.objid = (('x'||substr(md5(:table_name || "#{table_name}"."#{primary_key}"::text), 1, 16))::bit(64) << 32)::bit(32)::int
SQL
joins(sanitize_sql_for_conditions([join_sql, { table_name: table_name }]))
end)

...so the entire pg_locks.rb file could be deleted; I think I was only using it for debugging.

I want to get you unblocked as expediently as possible, so don't let this block: looking at the SQL, I think this also may be incompatible with Rails' built-in multidatabase support.

Here is the quick lockable.rb that made it work for me

https://gist.github.com/gadimbaylisahil/f94f6e8a4d363b00c8266c93fe7a3634

Would I face any side-effects with this you reckon?

@bensheldon
Copy link
Owner

I went searching through ActiveRecord and I think it would be better to use the adapter quoting methods: https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/Quoting.html

@bensheldon
Copy link
Owner

Oops, sorry I missed your earlier comment. Yes. Please open a PR 🎉

@bensheldon
Copy link
Owner

@gadimbaylisahil please let me know if the v1.2.4 fixes your problem. Thank you for the PR! 🎉

@gadimbaylisahil
Copy link
Contributor Author

@gadimbaylisahil please let me know if the v1.2.4 fixes your problem. Thank you for the PR! tada

Amazing! All works now! Thanks for directions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants