-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cobbler Provider #5969
Cobbler Provider #5969
Conversation
This is the largest amount of work I've done with both Go and Terraform. Please don't hold back on critiques and reviews -- I would really appreciate the education 😄 |
I've only skimmed but it looks awesome so far. :) Are you using the latest godep? I see some files in vendor like |
@cbednarski I'm using v60 which I think is the latest version. Perhaps a bug? |
Latest for me also. No worries! |
This looks awesome |
return fmt.Errorf("Cobbler Distro: Error Deleting (%s): %s", d.Id(), err) | ||
} | ||
|
||
d.SetId("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually done for you if you return successfully from delete. 👍
Great work on this @jtopjian! Just the |
This introduces a provider for Cobbler. Cobbler manages bare-metal deployments and, to some extent, virtual machines. This initial commit supports the following resources: distros, profiles, systems, kickstart files, and snippets.
2a29943
to
7370054
Compare
okie dokie, |
log.Printf("[DEBUG] Cobbler System: Created System: %#v", newSystem) | ||
d.SetId(newSystem.Name) | ||
|
||
log.Printf("[DEBUG] Cobbler System: syncing system") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Sync after every create will cause Cobbler to fail, if you try to create enough systems at once. You should call Sync only once after all resources have been created. This appears to be problematic in Terraform. We tried to solve this using a channel in our version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch @thijsschnitger - thanks!
The upstream issue is here:
And the helper is here:
I'm wondering if we can solve this with more of a straight mutex rather than a time based goroutine. I'll ask the folks in the upstream issue about that idea. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to get around this by using a sync lock. See lines 14, 422, and 423. Each system's sync will be called in serial rather than parallel. There's an acceptance test that builds 50 systems to confirm there are no failures in building. I've increased the number to 100 and have had same results.
Removing the sync lock would show failures due to the issue described.
@phinze There is some discussion related to this in Slack a few weeks back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha nice so the thing I was picturing is already done! That works nicely then. Thanks @jtopjian 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
All praise should go to @thijsschnitger and @mongrelion for first discovering the issue and making sure it didn't sneak its way in 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we used a sync.Mutex
in the beginning when we first faced this issue but somehow with a 100+ systems it would still fail. That's why we decided to go for the channels hack.
I don't have time for testing this out but could you try with 200 systems or so? I hope that the Mutex
is enough because I was not happy with the channel hack but it was the best we could do at that time.
Ideally Terraform, just as it has a setup method for your provider, should also have a mechanism for tearing it down.
There was another use case that we had for this were we would have to login into some API, create a bunch of resources and then would have to logout but because of the lack of some teardown method we were running into zombie sessions on the server side of things (the API didn't use any sort of token auth but instead user/pass combo) so we had to drop more workarounds for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mongrelion I completely agree that a final Terraform step to run the sync would work best.
I just ran the acceptance test suite with 200 systems and it passed. No doubt that if there is some other activity going on, it might interfere with the series of syncs, but in general, I think the sync.Mutex
is pretty stable.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I use Cobbler to manage DHCP and DNS, even with as few as 7 systems systemd fails to restart these services repeatedly and as a result does not keep them running:
systemd: start request repeated too quickly for dhcpd.service
systemd: Failed to start DHCPv4 Server Daemon.
systemd: Unit dhcpd.service entered failed state.
systemd: dhcpd.service failed.
(...)
systemd: start request repeated too quickly for named-setup-rndc.service
systemd: Failed to start Generate rndc key for BIND (DNS).
systemd: Unit named-setup-rndc.service entered failed state.
systemd: named-setup-rndc.service failed.
systemd: start request repeated too quickly for named.service
systemd: Failed to start Berkeley Internet Name Domain (DNS).
systemd: Unit named.service entered failed state.
systemd: named.service failed.
Terraform apply fails with the error:
Cobbler System: Error syncing system: error: "<class 'cobbler.cexceptions.CX'>:'cobbler trigger failed: cobbler.modules.sync_post_restart_services'" code: 1
Maybe systemd can be configured to handle this gently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the report. Since this has been merged into the main code base, can you open an issue? Might be easier to handle 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! GH-6419
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This introduces a provider for Cobbler. Cobbler manages bare-metal deployments and, to some extent, virtual machines. This initial commit supports the following resources: distros, profiles, systems, kickstart files, and snippets.
This supersedes #4271.
cc: @mongrelion @thijsschnitger