Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1058] Serially deploy modules that share a cachedir #1059

Conversation

justinstoller
Copy link
Member

When we raised the default threadpool for deploying modules above 1, we
began to see issues when caching a remote locally if that remote/cache
was shared by multiple modules within the Puppetfile.

To resolve this, when we create the module we group it with any other
module that may share its cachedir. We then pull modules by cachedir off
of the queue and serially process those.

Please add all notable changes to the "Unreleased" section of the CHANGELOG.

@justinstoller justinstoller requested a review from a team May 1, 2020 21:49
When we raised the default threadpool for deploying modules above 1, we
began to see issues when caching a remote locally if that remote/cache
was shared by multiple modules within the Puppetfile.

To resolve this, when we create the module we group it with any other
module that may share its cachedir. We then pull modules by cachedir off
of the queue and serially process those.
Copy link
Collaborator

@Magisus Magisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it'll do the trick, while maintaining as much parallelism as is safe.

@Magisus
Copy link
Collaborator

Magisus commented May 5, 2020

I heard plans to add an acceptance test, which I think we should do. Was that going to be added to this PR? Or should we add this in, use the existing test that failed because of this as-is to verify it, and add the new one later (in the interest of the release timeline)?

mod2 = spy('module2')
mod3 = spy('module3')
mod4 = spy('module4')
mod5 = spy('module4')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally still 'module4'? or should it be 5?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't really matter, since calling spy will still create unique objects regardless of that string. But I'll update it.


queue = subject.modules_queue(visitor)
expect(queue.length).to be 4
queue_array = 4.times.map { queue.pop }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be length.times.map instead of hard-coding 4?

@mwaggett
Copy link
Contributor

mwaggett commented May 5, 2020

If we're in a rush, I'm okay using the existing test, but I do think we should ultimately have one that's explicitly testing this. I'll leave it up to @justinstoller to decide whether it can be done quick enough to be included here.

@adrienthebo
Copy link
Contributor

adrienthebo commented May 5, 2020

I was under the mistaken assumption that serializing access to caches had been resolved as part of the parallel modules pull request. Given that this is still an issue we may want to consider dropping the default pool_size back to 1 while we verify that parallelization is safe. Part of this concern is that we have the WithModules environment that has another entrypoint into interacting with modules that this pull request might not account for.

@justinstoller
Copy link
Member Author

How about this? #1060

@justinstoller
Copy link
Member Author

Ugh, it looks like the WithModules implementation maybe just copied and pasted code from the Puppetfile implementation? We probably should refactor out that shared code.

justinstoller added a commit to justinstoller/r10k that referenced this pull request May 6, 2020
After updating the default pool_size, a race condition was
found when initializing our local git cache. See
puppetlabs#1058 for the issue in detail.

When attempting a fix we realized that there were multiple copies of the
code to iterate over modules to manage. See
puppetlabs#1059 for a first attempt at this
problem and discussion.

In order to meet our current timelines, the default will need to reverted
until we can fix this issue properly. Hence this commit and a55536c.

This commit was necessary because the CHANGELOG has been refactored since
the commit referencing the default was added.
@@ -180,6 +189,19 @@ def accept(visitor)

private

def module_vcs_cachedir(mod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method deals with properties of modules, we can push this detail down into the modules themselves instead of probing objects for methods and acting from there.

@@ -212,15 +234,22 @@ def concurrent_accept(visitor, pool_size)
def modules_queue(visitor)
Queue.new.tap do |queue|
visitor.visit(:puppetfile, self) do
modules.each { |mod| queue << mod }
modules_by_cachedir = modules_by_vcs_cachedir.clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel safer if this was addressed before the PR is merged. Just to avoid any potential unforeseen issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but I'm not sure how this would matter in WithModules. That class doesn't have a concurrent_accept mode so far as I can tell, so its modules would be processed first, always in serial, and then the Puppetfile would be processed and this logic would handle the potential conflict.

I can't find any other other places that respect pool_size besides what's in the Puppetfile here. What did y'all think was potentially problematic?

Copy link

@dhollinger dhollinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have anything additional to add that @adrienthebo and @mwaggett haven't already pointed out. I would like to see the potential interaction with WithModules to be solved before merging though

@justinstoller
Copy link
Member Author

Yeah, sorry, I didn't mean to ghost, y'all. When we thought we could make this in before a deadline we gave it a go, and when it became apparent we weren't going to make it for that deadline it dropped way down our priority list.

@Magisus
Copy link
Collaborator

Magisus commented Jul 23, 2020

I don't appear to be able to push to this branch. Is that something @justinstoller can change, or do I need to open a new PR to pick this up?

@justinstoller
Copy link
Member Author

I pushed this code to this branch so anyone can take it over: https://github.com/puppetlabs/r10k/tree/GH-1058_dont-concurrently-modify-same-cache

@Magisus
Copy link
Collaborator

Magisus commented Jul 29, 2020

Superseded by #1081

@Magisus Magisus closed this Jul 29, 2020
@justinstoller justinstoller deleted the GH-1058_dont-concurrently-modify-same-cache branch April 27, 2021 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants