Git cache breaks with pool_size > 1 if managing the same git source twice #1058

Magisus · 2020-04-30T23:12:31Z

A bug recently surfaced in our beaker acceptance tests, that seems valid, but might not be a common workflow. It appeared after the merge of #1044, updating the default pool size to 4.

If you are managing two objects (in this case non-module objects) backed by the same git repo, and are using a pool_size of more than 1, the caching mechanism for git behaves poorly on the initial deploy.

Puppetfile

mod 'non_module_object_1',
  :install_path => './',
  :git => 'git://github.com/puppetlabs/control-repo.git',
  :branch => 'production'
mod 'non_module_object_2',
 :install_path => '',
 :git => 'git://github.com/puppetlabs/control-repo.git',
 :branch => 'production'

Error output at --verbose=debug2 from r10k deploy environment -p:

[2020-04-30 22:28:15 - INFO] Deploying Puppetfile content /etc/puppetlabs/code/environments/production/non_module_object_1
[2020-04-30 22:28:15 - DEBUG1] Creating new git cache for "git://github.com/puppetlabs/control-repo.git"
[2020-04-30 22:28:15 - DEBUG] Module thread 23963080 exiting: queue empty
[2020-04-30 22:28:15 - DEBUG2] Starting process: ["git", "clone", "--mirror", "git://github.com/puppetlabs/control-repo.git", "/var/cache/r10k/git---github.com-puppetlabs-control-repo.git"]
[2020-04-30 22:28:15 - INFO] Deploying Puppetfile content /etc/puppetlabs/code/environments/production/non_module_object_2
[2020-04-30 22:28:15 - DEBUG1] Creating new git cache for "git://github.com/puppetlabs/control-repo.git"
[2020-04-30 22:28:15 - DEBUG2] Starting process: ["git", "clone", "--mirror", "git://github.com/puppetlabs/control-repo.git", "/var/cache/r10k/git---github.com-puppetlabs-control-repo.git"]
[2020-04-30 22:28:15 - DEBUG] Module thread 23579460 exiting: queue empty
[2020-04-30 22:28:15 - DEBUG2] Finished process:
Command: git clone --mirror git://github.com/puppetlabs/control-repo.git /var/cache/r10k/git---github.com-puppetlabs-control-repo.git
Stderr:
fatal: destination path '/var/cache/r10k/git---github.com-puppetlabs-control-repo.git' already exists and is not an empty directory.
Exit code: 128
[2020-04-30 22:28:15 - ERROR] Command exited with non-zero exit code:
Command: git clone --mirror git://github.com/puppetlabs/control-repo.git /var/cache/r10k/git---github.com-puppetlabs-control-repo.git
Stderr:
fatal: destination path '/var/cache/r10k/git---github.com-puppetlabs-control-repo.git' already exists and is not an empty directory.
Exit code: 128

I think this occurs because two threads kick off to deploy the entries in the puppetfile and both try to cache the new repo simultaneously, so the second clone fails because its path isn't clear -- the directory was populated between when it decided it needed to clone and when the clone actually completed. Subsequent deploys work fine, because the cache is already populated.

This reproduced very reliably. We're updating our tests for now to no longer have two resources with the same backing repo for the time being, since we think that isn't terribly valid. Seeing as no one has hit this yet, it seems uncommon at least.

The text was updated successfully, but these errors were encountered:

justinstoller · 2020-04-30T23:43:17Z

@adrienthebo , @dhollinger , @reidmv , @bastelfreak , @binford2k , @npwalker I'm super curious to see if any of you have concerns about how often the above situation might be encountered.

bastelfreak · 2020-05-01T17:51:13Z

Hi,
sounds like an interesting usecase, I haven't seen that in the wild. But my assumption is that it's a valid one and that it should work.

npwalker · 2020-05-01T19:18:00Z

While it might currently work, I'm not sure it makes a lot of sense. I'd be fine failing if we can detect that two modules use the same git source.

if it's easy enough we could downgrade to a pool_size of 1 instead of failing.

Or depending on the work compared to the above we could try to make the caching code safe from two threads trying to both cache the same git source twice.

Or if we can detect the same git source twice we could just dedupe so we never try to download the same git source twice?

Lots of options but given the low likelihood of someone trying to do this I'd be fine failing and saying the user needs to set pool_size to 1

binford2k · 2020-05-01T19:43:47Z

I don't see the benefit of duplicate git sources. But it also seems relatively easy to detect. At that point you could either dedupe or just downgrade pool_size to 1 (with a warning).

Magisus · 2020-05-01T20:13:00Z

We're going to look into a real fix this afternoon, because if it's relatively straightforward, the best option is obviously just to fix it.

When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.

justinstoller · 2020-05-02T00:08:43Z

I've opened #1059 with an approach to fixing this instead of warning. For those of you I pinged that review code I'd welcome any comments wrt the approach and unit tests. I still need to update an acceptance test and changelog but I think it's functionally correct.

After updating the default pool_size, a race condition was found when initializing our local git cache. See puppetlabs#1058 for the issue in detail. When attempting a fix we realized that there were multiple copies of the code to iterate over modules to manage. See puppetlabs#1059 for a first attempt at this problem and discussion. In order to meet our current timelines, the default will need to reverted until we can fix this issue properly. Hence this commit and a55536c. This commit was necessary because the CHANGELOG has been refactored since the commit referencing the default was added.

When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.

This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads

github-actions · 2021-05-11T02:27:55Z

This issue has been marked stale because it has had no activity for 60 days. The Puppet Team is actively prioritizing existing bugs and new features, if this issue is still important to you please comment and we will add this to our backlog to complete. Otherwise, it will be closed in 7 days.

Magisus mentioned this issue Apr 30, 2020

(maint) Remove second non_module_object from test #1057

Closed

justinstoller mentioned this issue May 6, 2020

Revert increase in default pool size #1060

Merged

Magisus added a commit that referenced this issue Jul 24, 2020

[#1058] Add accessor for a module's cachedir

f952fc6

Magisus added a commit to Magisus/r10k that referenced this issue Jul 27, 2020

[puppetlabs#1058] Add accessor for a module's cachedir

bffb908

Magisus added a commit to Magisus/r10k that referenced this issue Jul 27, 2020

[puppetlabs#1058] Add accessor for a module's cachedir

9c3c5e7

mwaggett added a commit that referenced this issue Jul 30, 2020

Merge pull request #1081 from Magisus/concurrent-cache

48ac619

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads

github-actions bot added the stale label May 11, 2021

github-actions bot closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Git cache breaks with pool_size > 1 if managing the same git source twice #1058

Git cache breaks with pool_size > 1 if managing the same git source twice #1058

Magisus commented Apr 30, 2020 •

edited

Loading

justinstoller commented Apr 30, 2020

bastelfreak commented May 1, 2020

npwalker commented May 1, 2020

binford2k commented May 1, 2020

Magisus commented May 1, 2020

justinstoller commented May 2, 2020

github-actions bot commented May 11, 2021

Git cache breaks with pool_size > 1 if managing the same git source twice #1058

Git cache breaks with pool_size > 1 if managing the same git source twice #1058

Comments

Magisus commented Apr 30, 2020 • edited Loading

justinstoller commented Apr 30, 2020

bastelfreak commented May 1, 2020

npwalker commented May 1, 2020

binford2k commented May 1, 2020

Magisus commented May 1, 2020

justinstoller commented May 2, 2020

github-actions bot commented May 11, 2021

Magisus commented Apr 30, 2020 •

edited

Loading