-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packer 1.3.1 - Random Ansible Provisioner Failures on Windows Hosts #6731
Comments
yeah random failures are definitely unfun. Any chance you'd be capable of running a bisect for me? |
other questions: does this happen on v. 1.2.5? Does this cause a build failure nearly every time? Or intermittently? if intermittently, about what percentage would you guess that a build fails? |
I can test out 1.2.5 over the next few days when I have a free minute!
I tested about 10 times and received a failure at "one" (different step
every time) of the registry edit steps every time (none ever got past that
playbook in the build process with either ansible version) it was
intermittent as to what specific step failed but it always failed at one of
them (but the first playbook that does a couple dotnet installs works
reliably it seems)
I'm unsure of what a bisect is (I've heard of, but never done a git bisect)
I'll Google packer bisect tomorrow and see if I can track down what to do!
…On Thu, Sep 20, 2018, 5:32 PM Megan Marsh ***@***.***> wrote:
other questions: does this happen on v. 1.2.5? Does this cause a build
failure nearly every time? Or intermittently? if intermittently, about what
percentage would you guess that a build fails?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6731 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABgN1jtI1OL3EwZ983w22tAY3m0PK6dcks5udAlvgaJpZM4WuW5C>
.
|
You nailed it -- I was talking about a git bisect. You'd have to clone the repo and build from source, then do a git bisect. Git bisects themselves are pretty straightforward and google can give you more help than I can, probably. I know it's asking a lot of you but doing so would really help me narrow down whatever introduced this issue. The Ansible provisioner is a community-supported one, meaning normally Hashicorp engineers don't do more than reviewing PRs for it, so anything you can do to give more information to the community will help get this resolved. |
@LangJV any updates on whether you were successful with an older version of Packer? |
Hello,
I was never able to do a git bisect, but i believe the ansible failure was
actually a red herring.
It seems that, between 1.2.4 and 1.3.1, the behavior of windows-restart
changed slightly (or I'm using it wrong)
My packer template at the point in question does 3 things in order:
1. Install .net 4.7 via an ansible playbook
2. Restart via the restart-computer item (since ansible cannot do the
restarts due to the way ansible is proxied via packer, and the connection
being lost)
3. Kick off an ansible playbook of registry edits for WSUS related items.
I have the partial snippet below for full review. In 1.2.4 this restart
command seems to wait until DotNet is fully installed. Meaning it waits
through multiple reboots correctly. It seems like when I use 1.3.1 in debug
mode, the restart-computer is reporting "complete" while the Console still
shows "Installing Updates XX%". So, what then happens is the next playbook
(registry edits) kicks off and fails "somewhere" in the middle as the
server reboots again to continue installing .Net, which fails out the
entire workflow. I had a coworker reproduce this for me, and he worked
around it by adding tons of "pauses" after restart-computer to ensure
enough time for install actions to complete. It totally slows things down a
bit, but seems to work. He was able to reproduce the same behavior with a
restart after applying patches as well as installing certain Windows Roles
or Features, as all of these lead to potential multiple reboots.
I had looked over the restart-computer documentation in packer, and didn't
see anything obvious that indicates a change in behavior, so i "think" it
was unintentional, or the product of something else?
Does this sound plausible? or am I just putting the wrong things together
in my hypothesis, based on what I'm seeing?
~Jason Lang
{
"type": "ansible",
"playbook_file": "ansible/dotnet_install.yaml",
"extra_arguments": [
"--connection", "packer",
"--extra-vars", "ansible_shell_type=powershell
ansible_shell_executable=None"
]
},
{
"type": "windows-restart",
"restart_check_command": "powershell -command \"& {Write-Output
'restarted.'}\"",
"restart_timeout": "30m",
"restart_command": "shutdown /r /t 0 /d p:4:1 /c \"Packer DotNet
Install Restart\""
},
{
"type": "ansible",
"playbook_file": "ansible/wsus_config.yaml",
"extra_arguments": [
"--connection", "packer",
"--extra-vars", "ansible_shell_type=powershell
ansible_shell_executable=None"
]
},
…On Tue, Nov 13, 2018 at 1:49 PM Megan Marsh ***@***.***> wrote:
@LangJV <https://github.com/LangJV> any updates on whether you were
successful with an older version of Packer?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6731 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABgN1i3jVKIhw3be7LusCc19uqWJOTelks5uuxRFgaJpZM4WuW5C>
.
|
This sounds totally plausible. We did have a windows-restart regression in v1.3.1, and it should be fixed by #6792, which was released with v1.3.2. Can you give that version a try and let me know if this is still a problem? |
Closing since I never got a response; if this is still an issue, let me know and we can reopen. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Packer 1.3.1
QEMU/KVM build of Server 2012R2 and/or Server 2016
Ansible 2.6.3 and 2.5.5
When building using packer 1.3.1 I get "random" ansible failures throughout my playbook. Different steps each time - but always a random failure. More detailed logs are here: https://groups.google.com/forum/#!topic/packer-tool/8zCL1owdL7I
I downgraded Packer from 1.3.1 to 1.2.4 and was able to then build without issue using both Ansible 2.6.3 and Ansible 2.5.5. Both of these ansible versions showed the "random error" manifesting with Packer 1.3.1
I've attached the 3 playbooks i ran up to the failure point - to show nothing crazy here. I can consistently re-run this and it will fail at "one" of the many registry edit steps (but not always the same one) with the errors indicated in the above google group logs.
I know "random" failures are super icky. My (unverified) hunch tells me this might be time-based. My two restart proveiders used to have a 15 second delay. When i removed that - it began consistently getting "further" before failing (still randomly)
stuff.zip
The text was updated successfully, but these errors were encountered: