[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1081

Magisus · 2020-07-27T20:35:16Z

Supersedes #1080

When we raised the default threadpool for deploying modules above 1, we began to see issues when caching a remote locally if that remote/cache was shared by multiple modules within the Puppetfile. To resolve this, when we create the module we group it with any other module that may share its cachedir. We then pull modules by cachedir off of the queue and serially process those.

Magisus · 2020-07-27T20:44:07Z

Still interested in thoughts on #1059 (comment).

CHANGELOG.mkd

Magisus · 2020-07-28T16:38:20Z

This does increase the default pool_size again, I think that's what we wanted?

Magisus · 2020-07-28T16:39:03Z

There was a request way back when for an acceptance test that demonstrates this bug (and its fix) so I'm working on that now.

Magisus · 2020-07-28T19:53:22Z

I verified that this test failed with a pool size greater than one, before this fix. I had forgotten, but I'll note here (as well as in the commit message) for posterity that the error only occurs in shellgit.

This commit changes the default number of threads used when downloading modules from 1 to 4.

integration/tests/git_source/git_source_repeated_remote.rb

mwaggett · 2020-07-28T21:19:03Z

integration/tests/git_source/git_source_repeated_remote.rb

+CONF
+
+puppetfile = <<-EOS
+mod 'non_module_object_1',


what does it mean for this to be a 'non module object'?

The probably don't have to be for this test to work, I might change that.

But some useful links on that topic: https://tickets.puppetlabs.com/browse/CODEMGMT-649, https://puppet.com/docs/pe/2018.1/puppetfile.html#declare_git_repositories_in_the_puppetfile

Magisus · 2020-07-28T21:42:09Z

I'll update the test to be a slightly less-weird setup. There's no reason most of this can't be a more usual workflow and still demonstrate this bug.

Magisus · 2020-07-28T22:45:59Z

@mwaggett see if that's a little clearer. I left the source repo alone for consistency, like we discussed, but updated what's in the Puppetfile to be less unusual and added a couple of assertions to better illustrate the result of the deploy.

mwaggett · 2020-07-28T22:52:57Z

integration/tests/git_source/git_source_repeated_remote.rb

+git_repo_path = '/git_repos'
+git_repo_name = 'environments'
+git_control_remote = File.join(git_repo_path, "#{git_repo_name}.git")
+code_dir = '/etc/puppetlabs/code/environments/production'


Should the env_path be incorporated into this somehow? Is it "#{env_path}/production" or "/etc/puppetlabs/code/#{env_path}/production"? Looks like the former, based on https://puppet.com/docs/puppet/5.5/environments_creating.html#environmentpath .

The env_path is the absolute path, so /etc/puppetlabs/code/environments. But yeah it can be incorporated.

cool, yeah, I think if we include env_path, it's clearer that the git_repo_name has no relationship to that path.

mwaggett · 2020-07-28T22:55:16Z

integration/tests/git_source/git_source_repeated_remote.rb

+
+mod 'test_apache',
+  :git => 'git://github.com/puppetlabs/puppetlabs-apache.git',
+  :branch => 'master'


Would it be more realistic for this to be the 'test' branch?

Is it worth testing both ^the more realistic case and a case where the sources are literally identical? Or is that kind of redundant/unnecessary?

That module only has a master and a release branch in its repo, and I'm not sure if release is permanent. I need to make sure it's something that this test is unlikely to fail on.

Users in the r10k community were unable to come up with a real use case that would trigger this bug. So no matter what we do here, it's not likely to match up to any known real workflow.

gotcha, okay 👍

I could find a different module that has more branches, or create a fixture, but we use apache elsewhere in these tests so I went with it.

if we don't think the two cases are different, I'm fine leaving it

This commit adds a beaker acceptance test verifying that it is possible to use the same remote for several non-module objects. This case previously broke when the `pool_size` feature was introduced, because there was a race condition when creating the git cache in shellgit mode, when the same remote was used in two objects being set up concurrently.

Magisus · 2020-07-29T16:58:59Z

@adrienthebo I'd appreciate a look at this if you have a minute, since the original concern with WithModules was yours, and Justin and I don't see an issue, but maybe we're missing something? #1059 (comment)

dhollinger

Just had one question, otherwise looks good to me

dhollinger · 2020-07-29T19:02:54Z

lib/r10k/puppetfile.rb

@@ -212,15 +221,22 @@ def concurrent_accept(visitor, pool_size)
  def modules_queue(visitor)
    Queue.new.tap do |queue|
      visitor.visit(:puppetfile, self) do
-        modules.each { |mod| queue << mod }
+        modules_by_cachedir = modules_by_vcs_cachedir.clone


Am I missing something here or should modules_by_vcs_cachedir be an instance variable or is this appropriate since it's available as an attr_reader?

Correct, this calls the attr_reader method for which the default implementation is to simply return the instance variable of the same name. We added the reader so we could retrieve that value in the tests. This could just as easily be @modules_by_vcs_cachedir.clone, since we're in the class where the variable is definied, but as I understand it, it's generally good practice to use the reader method if one is provided. Makes it easier to override with a custom getter later if you end up needing to do something special when retrieving the value.

justinstoller and others added 2 commits July 27, 2020 13:34

[puppetlabs#1058] Add accessor for a module's cachedir

9c3c5e7

Magisus requested review from adrienthebo and dhollinger as code owners July 27, 2020 20:35

Magisus requested a review from a team July 27, 2020 20:35

Magisus mentioned this pull request Jul 27, 2020

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1080

Closed

mwaggett reviewed Jul 27, 2020

View reviewed changes

CHANGELOG.mkd Outdated Show resolved Hide resolved

mwaggett reviewed Jul 27, 2020

View reviewed changes

CHANGELOG.mkd Outdated Show resolved Hide resolved

mwaggett reviewed Jul 27, 2020

View reviewed changes

CHANGELOG.mkd Show resolved Hide resolved

mwaggett previously approved these changes Jul 27, 2020

View reviewed changes

Magisus dismissed mwaggett’s stale review via aada897 July 27, 2020 20:51

Magisus force-pushed the concurrent-cache branch 2 times, most recently from aada897 to cc987ac Compare July 27, 2020 21:02

mwaggett reviewed Jul 27, 2020

View reviewed changes

CHANGELOG.mkd Outdated Show resolved Hide resolved

Magisus force-pushed the concurrent-cache branch from cc987ac to 0272397 Compare July 27, 2020 21:18

mwaggett previously approved these changes Jul 27, 2020

View reviewed changes

Magisus dismissed mwaggett’s stale review via f4235f8 July 28, 2020 19:52

Magisus added 2 commits July 28, 2020 13:28

[puppetlabs#1038] Increase default pool_size to 4

e798295

This commit changes the default number of threads used when downloading modules from 1 to 4.

(maint) Update changelog

15f9a6f

Magisus force-pushed the concurrent-cache branch from f4235f8 to c5ccccc Compare July 28, 2020 20:28

mwaggett reviewed Jul 28, 2020

View reviewed changes

integration/tests/git_source/git_source_repeated_remote.rb Show resolved Hide resolved

mwaggett reviewed Jul 28, 2020

View reviewed changes

integration/tests/git_source/git_source_repeated_remote.rb Outdated Show resolved Hide resolved

mwaggett reviewed Jul 28, 2020

View reviewed changes

Magisus force-pushed the concurrent-cache branch from c5ccccc to eb9e026 Compare July 28, 2020 22:44

mwaggett reviewed Jul 28, 2020

View reviewed changes

Magisus force-pushed the concurrent-cache branch from eb9e026 to fbc8461 Compare July 28, 2020 23:04

mwaggett approved these changes Jul 28, 2020

View reviewed changes

Magisus mentioned this pull request Jul 29, 2020

[#1058] Serially deploy modules that share a cachedir #1059

Closed

dhollinger reviewed Jul 29, 2020

View reviewed changes

dhollinger approved these changes Jul 29, 2020

View reviewed changes

mwaggett merged commit 48ac619 into puppetlabs:master Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1081

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1081

Magisus commented Jul 27, 2020

Magisus commented Jul 27, 2020

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020 •

edited

Loading

mwaggett Jul 28, 2020

Magisus Jul 28, 2020

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020 •

edited

Loading

mwaggett Jul 28, 2020

Magisus Jul 28, 2020

mwaggett Jul 28, 2020

Magisus Jul 28, 2020

mwaggett Jul 28, 2020

Magisus Jul 28, 2020

Magisus Jul 28, 2020

mwaggett Jul 28, 2020

Magisus Jul 28, 2020

mwaggett Jul 28, 2020

Magisus commented Jul 29, 2020

dhollinger left a comment

dhollinger Jul 29, 2020

Magisus Jul 29, 2020

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1081

[#1058] Avoid concurrent cache updates when downloading modules on multiple threads #1081

Conversation

Magisus commented Jul 27, 2020

Magisus commented Jul 27, 2020

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Magisus commented Jul 28, 2020

Magisus commented Jul 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Magisus commented Jul 29, 2020

dhollinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Magisus commented Jul 28, 2020 •

edited

Loading

Magisus commented Jul 28, 2020 •

edited

Loading