Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgLock state inspection is not isolated to current database #431

Closed
bensheldon opened this issue Oct 21, 2021 · 4 comments
Closed

PgLock state inspection is not isolated to current database #431

bensheldon opened this issue Oct 21, 2021 · 4 comments

Comments

@bensheldon
Copy link
Owner

Discovered that in #430 that when GoodJob explicitly looks at the lock state of a job by joining on pg_locks, it does not scope down to the current database, but rather looks across the cluster for that particular lock key. GoodJob does this on the Dashboard, to filter on which jobs are running. This does not affect the executing of jobs, because that uses Postgres locking functions.

GoodJob locks against a job's UUID, so it should be globally unique. The scenarios I imagine this biting someone is:

  • Copying production data into a staging/inspection replica (that is still on the same cluster as Production) and running a replica job and having that state become visible in the production system if someone inspects the production copy of that job.

...which seems somewhat farfetched and non-material.

I am documenting this here just in case someone believes they experience a side effect of this, but I don't think it warrants action, and maybe it's actually a feature e.g. if someone is sharding the good_jobs table maybe inspecting global lock state is a good thing.

@reczy
Copy link
Contributor

reczy commented Oct 21, 2021

GoodJob locks against a job's UUID, so it should be globally unique.

Just wanted to chime in about the chance of collisions here for documentation's sake for others.

Since the advisory lock keys are constrained to 64 bits of information (via 2 32-bit columns), we don't get the full benefit of a UUID's 122 bits of randomness (for UUID version 4). Given the birthday problem, we would expect a 1% chance of collision of lock keys after 610 million UUIDs and a 50% collision of lock keys after 5.1 billion UUIDs (assuming no issues with UUID generation or md5 hashing).

I'm not sure if this has any practical impact here... I just find the birthday problem academically fascinating!

@bensheldon
Copy link
Owner Author

@reczy love it! Thank you for sharing that. I appreciate you doing the math! And I am also curious about any randomness loss in the UUID -> MD5 -> BIGINT pipeline.

In practice, given how GoodJob uses advisory locks, the operation (e.g. running jobs) impact of a collision is that that the two colliding jobs could not be worked at the same time. I don't believe it would impact the most important (to me) contract of a job not being executed twice.

@reczy
Copy link
Contributor

reczy commented Oct 21, 2021

And I am also curious about any randomness loss in the UUID -> MD5 -> BIGINT pipeline.

Going from MD5 -> BIGINT is just taking 16 hexadecimal characters and converting from base 16 to base 10 so no loss there. But as for UUID -> MD5, I'm not sure. MD5 has been cryptographically broken for a while now, but I'm not sure what implications that has for its specific purpose in GoodJob.

If you did want to address, I think two potential alternatives are:

  1. Skip the hashing and just take the first 8 and last 8 characters of the UUID and convert to base 10 (this avoids hyphens and characters that are UUID version identifiers). Forgot about custom concurrency locks
  2. Do what you do now but use sha256 instead of md5

I think I like 1 because it eliminates hashing altogether and probably makes some of the lockable sql easier to read. Might need to think about how to migrate safely.

In practice, given how GoodJob uses advisory locks, the operation (e.g. running jobs) impact of a collision is that that the two colliding jobs could not be worked at the same time. I don't believe it would impact the most important (to me) contract of a job not being executed twice.

Agreed.

@bensheldon
Copy link
Owner Author

bensheldon commented Oct 22, 2021

@reczy that is clever! Unfortunately the concurrency-locks rely on an arbitrary string as the lock key.

I also keep wanting to extract GoodJob::Lockable into its own gem because when I looked at various ActiveRecord advisory-lock gems, there wasn't one that had the flexibility that I wanted (e.g. being able to fetch-and-lock records in a single operation). But I also keep adding new functionality to Lockable and it's convenient not to have to release across two gems 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants