Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cobbler Provider #5969

Closed
wants to merge 2 commits into from
Closed

Conversation

jtopjian
Copy link
Contributor

@jtopjian jtopjian commented Apr 1, 2016

This introduces a provider for Cobbler. Cobbler manages bare-metal deployments and, to some extent, virtual machines. This initial commit supports the following resources: distros, profiles, systems, kickstart files, and snippets.

This supersedes #4271.

cc: @mongrelion @thijsschnitger

@jtopjian
Copy link
Contributor Author

jtopjian commented Apr 1, 2016

This is the largest amount of work I've done with both Go and Terraform. Please don't hold back on critiques and reviews -- I would really appreciate the education 😄

@cbednarski
Copy link
Contributor

I've only skimmed but it looks awesome so far. :) Are you using the latest godep? I see some files in vendor like .travis.yml that are extraneous; I think the later versions of godep are better about stripping these out. go get -u github.com/tools/godep to update.

@jtopjian
Copy link
Contributor Author

jtopjian commented Apr 1, 2016

@cbednarski I'm using v60 which I think is the latest version. Perhaps a bug?

@cbednarski
Copy link
Contributor

I'm using v60 which I think is the latest version. Perhaps a bug?

Latest for me also. No worries!

@rossedman
Copy link

This looks awesome

return fmt.Errorf("Cobbler Distro: Error Deleting (%s): %s", d.Id(), err)
}

d.SetId("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually done for you if you return successfully from delete. 👍

@phinze
Copy link
Contributor

phinze commented Apr 5, 2016

Great work on this @jtopjian! Just the SetId("") thing in the Delete funcs and that one go fmt question and then this looks good to go!

This introduces a provider for Cobbler. Cobbler manages bare-metal
deployments and, to some extent, virtual machines. This initial
commit supports the following resources: distros, profiles, systems,
kickstart files, and snippets.
@jtopjian
Copy link
Contributor Author

jtopjian commented Apr 6, 2016

okie dokie, SetID("") has been removed.

@phinze phinze self-assigned this Apr 11, 2016
phinze added a commit that referenced this pull request Apr 16, 2016
@phinze
Copy link
Contributor

phinze commented Apr 16, 2016

Alright tests are passing - merged in 0db3a57 w/ an extra commit tweaking the acc test helper script and adding a TF config for spinning up an AWS instance to run the tests. 👍 Thanks for all your work on this @jtopjian!

log.Printf("[DEBUG] Cobbler System: Created System: %#v", newSystem)
d.SetId(newSystem.Name)

log.Printf("[DEBUG] Cobbler System: syncing system")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Sync after every create will cause Cobbler to fail, if you try to create enough systems at once. You should call Sync only once after all resources have been created. This appears to be problematic in Terraform. We tried to solve this using a channel in our version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch @thijsschnitger - thanks!

The upstream issue is here:

cobbler/cobbler#1570

And the helper is here:

https://github.com/hashicorp/terraform/pull/4271/files#diff-d5ac4199d4d95a59d258eb7e953f3d27R167

I'm wondering if we can solve this with more of a straight mutex rather than a time based goroutine. I'll ask the folks in the upstream issue about that idea. 👍

Copy link
Contributor Author

@jtopjian jtopjian Apr 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get around this by using a sync lock. See lines 14, 422, and 423. Each system's sync will be called in serial rather than parallel. There's an acceptance test that builds 50 systems to confirm there are no failures in building. I've increased the number to 100 and have had same results.

Removing the sync lock would show failures due to the issue described.

@phinze There is some discussion related to this in Slack a few weeks back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha nice so the thing I was picturing is already done! That works nicely then. Thanks @jtopjian 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

All praise should go to @thijsschnitger and @mongrelion for first discovering the issue and making sure it didn't sneak its way in 😄

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we used a sync.Mutex in the beginning when we first faced this issue but somehow with a 100+ systems it would still fail. That's why we decided to go for the channels hack.
I don't have time for testing this out but could you try with 200 systems or so? I hope that the Mutex is enough because I was not happy with the channel hack but it was the best we could do at that time.

Ideally Terraform, just as it has a setup method for your provider, should also have a mechanism for tearing it down.
There was another use case that we had for this were we would have to login into some API, create a bunch of resources and then would have to logout but because of the lack of some teardown method we were running into zombie sessions on the server side of things (the API didn't use any sort of token auth but instead user/pass combo) so we had to drop more workarounds for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mongrelion I completely agree that a final Terraform step to run the sync would work best.

I just ran the acceptance test suite with 200 systems and it passed. No doubt that if there is some other activity going on, it might interfere with the series of syncs, but in general, I think the sync.Mutex is pretty stable.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use Cobbler to manage DHCP and DNS, even with as few as 7 systems systemd fails to restart these services repeatedly and as a result does not keep them running:

systemd: start request repeated too quickly for dhcpd.service
systemd: Failed to start DHCPv4 Server Daemon.
systemd: Unit dhcpd.service entered failed state.
systemd: dhcpd.service failed.
(...)
systemd: start request repeated too quickly for named-setup-rndc.service
systemd: Failed to start Generate rndc key for BIND (DNS).
systemd: Unit named-setup-rndc.service entered failed state.
systemd: named-setup-rndc.service failed.
systemd: start request repeated too quickly for named.service
systemd: Failed to start Berkeley Internet Name Domain (DNS).
systemd: Unit named.service entered failed state.
systemd: named.service failed.

Terraform apply fails with the error:

Cobbler System: Error syncing system: error: "<class 'cobbler.cexceptions.CX'>:'cobbler trigger failed: cobbler.modules.sync_post_restart_services'" code: 1

Maybe systemd can be configured to handle this gently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report. Since this has been merged into the main code base, can you open an issue? Might be easier to handle 😄

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! GH-6419

@ghost
Copy link

ghost commented Apr 15, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants