Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolver parallelization #225

Merged
merged 6 commits into from
Feb 13, 2018
Merged

Resolver parallelization #225

merged 6 commits into from
Feb 13, 2018

Conversation

msimacek
Copy link
Contributor

Use postgres advisory locking for mutual exclusion between multiple resolver processes:

  • Repo resolver locks are per-collection - i.e. one process can resolve rawhide, another one can resolve f26 at the same time
  • Build resolver locks are per-repo_id - i.e. it processes a batch of builds with the same repo_id. The repo setup (sack) is slower than build resolution. So I want to avoid duplicating the repo setup step across multiple processes. In the general case, where there are usually 5-20 builds per repo, this works well. The disadvantage is that after mass tagging, the batch can be quite big, but I'm not sure if it's worth optimizing for something that happens so infrequently.

@codecov
Copy link

codecov bot commented Feb 13, 2018

Codecov Report

Merging #225 into master will increase coverage by 0.14%.
The diff coverage is 84.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #225      +/-   ##
==========================================
+ Coverage    82.1%   82.25%   +0.14%     
==========================================
  Files          34       35       +1     
  Lines        3130     3200      +70     
==========================================
+ Hits         2570     2632      +62     
- Misses        560      568       +8
Impacted Files Coverage Δ
koschei/backend/service.py 83.33% <ø> (-0.28%) ⬇️
koschei/backend/services/polling.py 54.28% <ø> (+2.93%) ⬆️
koschei/session.py 100% <100%> (ø) ⬆️
koschei/backend/__init__.py 80.74% <100%> (+0.74%) ⬆️
koschei/backend/services/build_resolver.py 81.17% <60%> (-1.33%) ⬇️
koschei/backend/services/repo_resolver.py 90.38% <71.42%> (-0.95%) ⬇️
koschei/db.py 91.13% <77.77%> (-0.53%) ⬇️
koschei/backend/services/resolver.py 96.77% <77.77%> (-1.52%) ⬇️
koschei/locks.py 91.89% <91.89%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bebca3...1ea3e5d. Read the comment docs.

Copy link
Member

@mizdebsk mizdebsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repo resolution locking doesn't seem correct.

koschei/locks.py Outdated
advisory lock.

:param: db Database session
:param: namespace Integer namespace key. Should use one of the constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting seems odd - this line describes two params, namespace and key, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the key should have it's own line, but I forgot that. Will fix

self.db, LOCK_REPO_RESOLVER, collection.id,
block=False, transaction=True,
)
self.process_repo(collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using trancaction-bound lock, but process_repo() commits multiple transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Even though other resolver processes won't pick the collection for processing of the same repo (the last repo_id is stored in the first transaction), it could pick a new one generated in the meantime and stomp on each other. Will fix.

@msimacek
Copy link
Contributor Author

Changes done

Copy link
Member

@mizdebsk mizdebsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks.

SQLA returns connections to the connection pool after a commit/rollback.
That makes usage of session-bound locks tricky, as the following
statements may not get the same connection (and thus the same session in
postgres sense). This keeps a connection in KoscheiBackendSession across
commits.
Ensure that only one collection is being processed at a time.
One resolver process locks single repo ID. Different processes can
resolve different repo IDs. More granular parallelism would likely be
counterproductive as the repo setup is expensive.
@msimacek msimacek merged commit 079eba6 into master Feb 13, 2018
@msimacek msimacek deleted the parallel-resolver branch February 13, 2018 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants