-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolver parallelization #225
Conversation
Codecov Report
@@ Coverage Diff @@
## master #225 +/- ##
==========================================
+ Coverage 82.1% 82.25% +0.14%
==========================================
Files 34 35 +1
Lines 3130 3200 +70
==========================================
+ Hits 2570 2632 +62
- Misses 560 568 +8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repo resolution locking doesn't seem correct.
koschei/locks.py
Outdated
advisory lock. | ||
|
||
:param: db Database session | ||
:param: namespace Integer namespace key. Should use one of the constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting seems odd - this line describes two params, namespace and key, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the key should have it's own line, but I forgot that. Will fix
self.db, LOCK_REPO_RESOLVER, collection.id, | ||
block=False, transaction=True, | ||
) | ||
self.process_repo(collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using trancaction-bound lock, but process_repo() commits multiple transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Even though other resolver processes won't pick the collection for processing of the same repo (the last repo_id is stored in the first transaction), it could pick a new one generated in the meantime and stomp on each other. Will fix.
Changes done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. Thanks.
SQLA returns connections to the connection pool after a commit/rollback. That makes usage of session-bound locks tricky, as the following statements may not get the same connection (and thus the same session in postgres sense). This keeps a connection in KoscheiBackendSession across commits.
Ensure that only one collection is being processed at a time.
One resolver process locks single repo ID. Different processes can resolve different repo IDs. More granular parallelism would likely be counterproductive as the repo setup is expensive.
88ce898
to
1ea3e5d
Compare
Use postgres advisory locking for mutual exclusion between multiple resolver processes: