-
-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockfile to mitigate r10k race condition. #268
Lockfile to mitigate r10k race condition. #268
Conversation
r10k has a small race condition which can cause failed deploys if two happen more or less simultaneously. To mitigate, we just lock on a file and wait for the other one to complete. Note that this doesn't catch the potential race that we get due to the fork-exec, but it at least makes the window smaller. This dropped failure rate in a busy classroom from ~30% to zero. The proper fix is to lock in r10k, of course.
+1, we have this issue with a Stash server, where you can choose to delete the branch at the time of merge or not at all (unlike github which leaves a delete button to click on later). Those events fire pretty much at the exact some moment and who knows which one wins, but everyone loses. |
@rnelson0 can you test this patch in your environment and report back to this PR? |
I'll try on Monday. Been running solo for a week and want to enjoy my weekend at least a little bit :) |
@acidprime @binford2k Yes and no. Here's the result in
It does not crap out and blow up git/r10k, and the second run's error is present anytime I delete a branch, even if it's not at the same time. That seems like an existing issue, but wanted to report it to make sure it's not a regression. |
@acidprime If you can get a new forge release out with this fix, we can bump and start using it in production next week, for more rapid testing and feedback with it. We may be going to PE code manager at work soon, which will limit our ability to test (my home setup is much more static). |
@acidprime ping |
@acidprime Ping. Would really love to see a new release that includes this. If there's anything I can do to help with maintenance of this module, let me know. |
@rnelson0 since code manager has shipped in PE I have let this module fall out of my day to day job responsibilities , I have been meaning to look into migrating it to vox populi. In the interim I have added you as a collaborator to this repo. If our interested in testing/running beaker tests on the feature PRs I am okay with you merging them and updating the Change log . I can cut a forge release for you once we are sure there is no breaking changes on a given feature tag. |
@binford2k could you rebase this once more to make sure it still fits and passes? I know it probably will but we did revamp the tests a good bit. Thanks! |
@binford2k the modulesync is complete, ready to rebase. Hit me up on slack/irc if you need to, the modulesync changed a LOT. |
Hrm, maybe it's not needed? Is there any relevant test for this? |
I'm not sure what a relevant test would be. Do you think that @adrienthebo what do you think would be a appropriate test here? |
# the other one to complete. | ||
file.flock(File::LOCK_EX) | ||
|
||
if Open3.respond_to?('capture3') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh god, can we please, please let Ruby 1.8.7 die? Ruby 1.9.3 has this (https://ruby-doc.org/stdlib-1.9.3/libdoc/open3/rdoc/Open3.html#method-c-capture3), and even that version is EOL'd.
Please, Ruby 1.8.7 is old and tired. It just wants to go out to pasture, graze quietly, and enjoy the twilight years of its life. Stop making Ruby 1.8.7 suffer on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@binford2k something like this
Also I'm going to take this time to say PLEASE LET RUBY 1.8.7 DIE AND NO I WILL NOT STOP HARPING ON THAT POINT BECAUSE AT THIS POINT I'M ENTERTAINED BY MY OWN COMPLAINING |
Vox Pupuli does not support ruby 1.8.7 and only has to support 1.9.3 for another 60ish days!Rob Nelson |
@rnelson0 OH MY GOD I LOVE YOU SO MUCH AND YOU HAVE BROUGHT SUCH JOY TO MY COLD, DEAD HEART |
This triggers 4 deploys in rapid fire. They should all be queued up and eventually succeed.
@@ -58,5 +58,17 @@ class {'r10k::webhook': | |||
expect(r.exit_code).to eq(0) | |||
end | |||
end | |||
|
|||
it 'should successfully lock when hammered with multiple requests' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the it
tests inside the shell loop to remedy the rubocop warnings on too many expectations. The other rubocop warnings should be pretty explanatory except the array one, but I've not seen that before, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't really good validations. Having four separate it
blocks implies that it's testing four different things, and it's not. It's testing one.
Not using the word "should" is fine, but all the other tests do, so that becomes inconsistent.
Not allowing a newline to delineate different blocks is just asinine, especially when theres a cascade of end
s, like below. Which one ends which block?
end
end
end
end
end
end
Its style guide even praises using newlines. https://github.com/bbatsov/ruby-style-guide#empty-lines-between-methods but has nothing for or against newlines around blocks.
The regex rule linked to doesn't actually say to always use %r
, it says "Use %r only for regular expressions matching at least one '/' character." This regex does not, it simply matches the word "success". I'd change it anyways to make the rule happy, but then this test would be inconsistent with the rest.
I fixed the times.map
(wasn't really needed anyway), took out the readability whitespace, and added a disable comment for the multiple expectations. If you want the other rules to pass, you ought to make a single commit to change all of them at once for consistency.
In discussions with Ben on Slack, this will me merged and then all the relevant rubocop tests will be fixed. See https://travis-ci.org/voxpupuli/puppet-r10k/jobs/181471454 for the actual failures. Ready to merge, pending ability to quickly PR the rubocop failures afterward. |
Fix rubocop failures from #268
Is the race condition still there in R10K? If not, should the lock mechanism in webhook binary and MCollective R10K plugin be removed? plus is there a link to the R10K racing condition issue? I didn't find one sorry. |
r10k has a small race condition which can cause failed deploys if two happen
more or less simultaneously. To mitigate, we just lock on a file and wait for
the other one to complete. Note that this doesn't catch the potential
race that we get due to the fork-exec, but it at least makes the window
smaller. This dropped failure rate in a busy classroom from ~30% to zero.
The proper fix is to lock in r10k, of course.