-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1058] Serially deploy modules that share a cachedir #1059
[#1058] Serially deploy modules that share a cachedir #1059
Conversation
When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.
a58c621
to
ffe111c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'll do the trick, while maintaining as much parallelism as is safe.
I heard plans to add an acceptance test, which I think we should do. Was that going to be added to this PR? Or should we add this in, use the existing test that failed because of this as-is to verify it, and add the new one later (in the interest of the release timeline)? |
mod2 = spy('module2') | ||
mod3 = spy('module3') | ||
mod4 = spy('module4') | ||
mod5 = spy('module4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentionally still 'module4'? or should it be 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't really matter, since calling spy
will still create unique objects regardless of that string. But I'll update it.
|
||
queue = subject.modules_queue(visitor) | ||
expect(queue.length).to be 4 | ||
queue_array = 4.times.map { queue.pop } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be length.times.map
instead of hard-coding 4?
If we're in a rush, I'm okay using the existing test, but I do think we should ultimately have one that's explicitly testing this. I'll leave it up to @justinstoller to decide whether it can be done quick enough to be included here. |
I was under the mistaken assumption that serializing access to caches had been resolved as part of the parallel modules pull request. Given that this is still an issue we may want to consider dropping the default |
How about this? #1060 |
Ugh, it looks like the |
After updating the default pool_size, a race condition was found when initializing our local git cache. See puppetlabs#1058 for the issue in detail. When attempting a fix we realized that there were multiple copies of the code to iterate over modules to manage. See puppetlabs#1059 for a first attempt at this problem and discussion. In order to meet our current timelines, the default will need to reverted until we can fix this issue properly. Hence this commit and a55536c. This commit was necessary because the CHANGELOG has been refactored since the commit referencing the default was added.
@@ -180,6 +189,19 @@ def accept(visitor) | |||
|
|||
private | |||
|
|||
def module_vcs_cachedir(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this method deals with properties of modules, we can push this detail down into the modules themselves instead of probing objects for methods and acting from there.
@@ -212,15 +234,22 @@ def concurrent_accept(visitor, pool_size) | |||
def modules_queue(visitor) | |||
Queue.new.tap do |queue| | |||
visitor.visit(:puppetfile, self) do | |||
modules.each { |mod| queue << mod } | |||
modules_by_cachedir = modules_by_vcs_cachedir.clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have some interesting interactions with https://github.com/puppetlabs/r10k/blob/master/lib/r10k/environment/with_modules.rb#L55-L64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel safer if this was addressed before the PR is merged. Just to avoid any potential unforeseen issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, but I'm not sure how this would matter in WithModules
. That class doesn't have a concurrent_accept
mode so far as I can tell, so its modules would be processed first, always in serial, and then the Puppetfile would be processed and this logic would handle the potential conflict.
I can't find any other other places that respect pool_size
besides what's in the Puppetfile here. What did y'all think was potentially problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have anything additional to add that @adrienthebo and @mwaggett haven't already pointed out. I would like to see the potential interaction with WithModules
to be solved before merging though
Yeah, sorry, I didn't mean to ghost, y'all. When we thought we could make this in before a deadline we gave it a go, and when it became apparent we weren't going to make it for that deadline it dropped way down our priority list. |
I don't appear to be able to push to this branch. Is that something @justinstoller can change, or do I need to open a new PR to pick this up? |
I pushed this code to this branch so anyone can take it over: https://github.com/puppetlabs/r10k/tree/GH-1058_dont-concurrently-modify-same-cache |
Superseded by #1081 |
When we raised the default threadpool for deploying modules above 1, we
began to see issues when caching a remote locally if that remote/cache
was shared by multiple modules within the Puppetfile.
To resolve this, when we create the module we group it with any other
module that may share its cachedir. We then pull modules by cachedir off
of the queue and serially process those.
Please add all notable changes to the "Unreleased" section of the CHANGELOG.