-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git cache breaks with pool_size > 1 if managing the same git source twice #1058
Comments
@adrienthebo , @dhollinger , @reidmv , @bastelfreak , @binford2k , @npwalker I'm super curious to see if any of you have concerns about how often the above situation might be encountered. |
Hi, |
While it might currently work, I'm not sure it makes a lot of sense. I'd be fine failing if we can detect that two modules use the same git source. if it's easy enough we could downgrade to a pool_size of 1 instead of failing. Or depending on the work compared to the above we could try to make the caching code safe from two threads trying to both cache the same git source twice. Or if we can detect the same git source twice we could just dedupe so we never try to download the same git source twice? Lots of options but given the low likelihood of someone trying to do this I'd be fine failing and saying the user needs to set pool_size to 1 |
I don't see the benefit of duplicate git sources. But it also seems relatively easy to detect. At that point you could either dedupe or just downgrade pool_size to 1 (with a warning). |
We're going to look into a real fix this afternoon, because if it's relatively straightforward, the best option is obviously just to fix it. |
When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.
When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.
I've opened #1059 with an approach to fixing this instead of warning. For those of you I pinged that review code I'd welcome any comments wrt the approach and unit tests. I still need to update an acceptance test and changelog but I think it's functionally correct. |
After updating the default pool_size, a race condition was found when initializing our local git cache. See puppetlabs#1058 for the issue in detail. When attempting a fix we realized that there were multiple copies of the code to iterate over modules to manage. See puppetlabs#1059 for a first attempt at this problem and discussion. In order to meet our current timelines, the default will need to reverted until we can fix this issue properly. Hence this commit and a55536c. This commit was necessary because the CHANGELOG has been refactored since the commit referencing the default was added.
When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.
When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.
This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.
This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.
This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.
This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.
[#1058] Avoid concurrent cache updates when downloading modules on multiple threads
This issue has been marked stale because it has had no activity for 60 days. The Puppet Team is actively prioritizing existing bugs and new features, if this issue is still important to you please comment and we will add this to our backlog to complete. Otherwise, it will be closed in 7 days. |
A bug recently surfaced in our beaker acceptance tests, that seems valid, but might not be a common workflow. It appeared after the merge of #1044, updating the default pool size to 4.
If you are managing two objects (in this case non-module objects) backed by the same git repo, and are using a
pool_size
of more than 1, the caching mechanism for git behaves poorly on the initial deploy.Puppetfile
Error output at
--verbose=debug2
fromr10k deploy environment -p
:I think this occurs because two threads kick off to deploy the entries in the puppetfile and both try to cache the new repo simultaneously, so the second clone fails because its path isn't clear -- the directory was populated between when it decided it needed to clone and when the clone actually completed. Subsequent deploys work fine, because the cache is already populated.
This reproduced very reliably. We're updating our tests for now to no longer have two resources with the same backing repo for the time being, since we think that isn't terribly valid. Seeing as no one has hit this yet, it seems uncommon at least.
The text was updated successfully, but these errors were encountered: