PgLock state inspection is not isolated to current database #431

bensheldon · 2021-10-21T14:22:28Z

Discovered that in #430 that when GoodJob explicitly looks at the lock state of a job by joining on pg_locks, it does not scope down to the current database, but rather looks across the cluster for that particular lock key. GoodJob does this on the Dashboard, to filter on which jobs are running. This does not affect the executing of jobs, because that uses Postgres locking functions.

GoodJob locks against a job's UUID, so it should be globally unique. The scenarios I imagine this biting someone is:

Copying production data into a staging/inspection replica (that is still on the same cluster as Production) and running a replica job and having that state become visible in the production system if someone inspects the production copy of that job.

...which seems somewhat farfetched and non-material.

I am documenting this here just in case someone believes they experience a side effect of this, but I don't think it warrants action, and maybe it's actually a feature e.g. if someone is sharding the good_jobs table maybe inspecting global lock state is a good thing.

The text was updated successfully, but these errors were encountered:

reczy · 2021-10-21T15:15:24Z

GoodJob locks against a job's UUID, so it should be globally unique.

Just wanted to chime in about the chance of collisions here for documentation's sake for others.

Since the advisory lock keys are constrained to 64 bits of information (via 2 32-bit columns), we don't get the full benefit of a UUID's 122 bits of randomness (for UUID version 4). Given the birthday problem, we would expect a 1% chance of collision of lock keys after 610 million UUIDs and a 50% collision of lock keys after 5.1 billion UUIDs (assuming no issues with UUID generation or md5 hashing).

I'm not sure if this has any practical impact here... I just find the birthday problem academically fascinating!

bensheldon · 2021-10-21T15:41:12Z

@reczy love it! Thank you for sharing that. I appreciate you doing the math! And I am also curious about any randomness loss in the UUID -> MD5 -> BIGINT pipeline.

In practice, given how GoodJob uses advisory locks, the operation (e.g. running jobs) impact of a collision is that that the two colliding jobs could not be worked at the same time. I don't believe it would impact the most important (to me) contract of a job not being executed twice.

reczy · 2021-10-21T16:37:13Z

And I am also curious about any randomness loss in the UUID -> MD5 -> BIGINT pipeline.

Going from MD5 -> BIGINT is just taking 16 hexadecimal characters and converting from base 16 to base 10 so no loss there. But as for UUID -> MD5, I'm not sure. MD5 has been cryptographically broken for a while now, but I'm not sure what implications that has for its specific purpose in GoodJob.

If you did want to address, I think two potential alternatives are:

~~Skip the hashing and just take the first 8 and last 8 characters of the UUID and convert to base 10 (this avoids hyphens and characters that are UUID version identifiers).~~ Forgot about custom concurrency locks
Do what you do now but use sha256 instead of md5

~~I think I like 1 because it eliminates hashing altogether and probably makes some of the lockable sql easier to read.~~ Might need to think about how to migrate safely.

In practice, given how GoodJob uses advisory locks, the operation (e.g. running jobs) impact of a collision is that that the two colliding jobs could not be worked at the same time. I don't believe it would impact the most important (to me) contract of a job not being executed twice.

Agreed.

bensheldon · 2021-10-22T15:26:10Z

@reczy that is clever! Unfortunately the concurrency-locks rely on an arbitrary string as the lock key.

I also keep wanting to extract GoodJob::Lockable into its own gem because when I looked at various ActiveRecord advisory-lock gems, there wasn't one that had the flexibility that I wanted (e.g. being able to fetch-and-lock records in a single operation). But I also keep adding new functionality to Lockable and it's convenient not to have to release across two gems 😁

bensheldon closed this as completed Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PgLock state inspection is not isolated to current database #431

PgLock state inspection is not isolated to current database #431

bensheldon commented Oct 21, 2021

reczy commented Oct 21, 2021

bensheldon commented Oct 21, 2021

reczy commented Oct 21, 2021 •

edited

Loading

bensheldon commented Oct 22, 2021 •

edited

Loading

PgLock state inspection is not isolated to current database #431

PgLock state inspection is not isolated to current database #431

Comments

bensheldon commented Oct 21, 2021

reczy commented Oct 21, 2021

bensheldon commented Oct 21, 2021

reczy commented Oct 21, 2021 • edited Loading

bensheldon commented Oct 22, 2021 • edited Loading

reczy commented Oct 21, 2021 •

edited

Loading

bensheldon commented Oct 22, 2021 •

edited

Loading