-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for expected number of drivers starting API #233
Conversation
/hold |
@@ -4,4 +4,14 @@ export IRONIC_DEPLOYMENT="API" | |||
|
|||
. /bin/configure-ironic.sh | |||
|
|||
# Wait for ironic to load all expected drivers | |||
# the DB query returns the number of unique hardware_type in the conductor_hardware_interfaces table | |||
CONF_DRIVERS=$(crudini --get /etc/ironic/ironic.conf DEFAULT enabled_hardware_types | tr ',' '\n' | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about how the plugins work in Ironic, but looking solely at the count makes me a little nervous.
Are there any default or builtin drivers that are always loaded, even if they are not listed in the config?
Would it make sense to compare the list of names, instead of just the count?
We've put some logic in the baremetal-operator to do a similar check, but it just waits for a non-zero count. Should that be made more sophisticated, too? That won't help the OpenShift installer, which the approach here does take care of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about how the plugins work in Ironic, but looking solely at the count makes me a little nervous.
Are there any default or builtin drivers that are always loaded, even if they are not listed in the config?
I don't think so, there is a default value for enabled_hardware_types if not mentioned in the config
Would it make sense to compare the list of names, instead of just the count?
I guess it would be more definite, although I'm not sure it would be worth the extra complexity
We've put some logic in the baremetal-operator to do a similar check, but it just waits for a non-zero count. Should that be made more sophisticated, too? That won't help the OpenShift installer, which the approach here does take care of.
Checking for a non-zero count is not reliable as the drivers are loaded individually, resulting in
https://bugzilla.redhat.com/show_bug.cgi?id=1902653
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be made more sophisticated, too?
To do this the BMO would need to know what the full list of drivers is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the BMO we could look for the driver we're going to need for the host we're reconciling, so we don't need the full list. @andfasano did that work, I think, so maybe he has ideas about how to extend it to handle this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better place, but we could extract the driver name from the BareMetalHost.Spec.BMC.Address
of the provisioned host and then check for it in the dependencies check here https://github.com/metal3-io/baremetal-operator/blob/d6eaf6774de775d83bdd47e04ea3d171ba7d4346/pkg/provisioner/ironic/dependencies.go#L66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix ironic so that we can rely on the API rather than checking ironic internals.
What would that fix look like? Doesn't the API already tell us which drivers are loaded, but the problem is we can get a different answer depending on when we ask during startup? |
Currently the API will start even if the conductor is still starting up, which can lead to failures if you try to deploy nodes as soon as the API is accessible. Co-Author: Steven Hardy <shardy@redhat.com>
a169642
to
3c8acf4
Compare
Will look at this again, in the meantime I've pushed up a newer PR that will use ssl for mysql if available. |
I think @derekhiggins has discovered that there is small window where the drivers are registered but not usable yet. We need to close this window in ironic, I don't think adding a workaround downstream is a good idea. |
Correct, we get a different answer depending on what iteration we are in on this loop
No (although I did think this at one stage), the problem is that drivers are registered one at a time, once the BMO sees the first one it starts creating nodes but the one it needs might not be there yet. Dealing with this in the BMO would mean also doing it in terraform-provider-ironic, and either keeping a list of expected drivers in both or mapping the install-config BMC URL to the expected driver name in ironic. |
Well, at least in BMO we can use the host details in the reconcile loop to look for just the desired driver, as also @dhellmann suggested |
I think it's an error to try registering a node if its driver is not loaded yet. If we do want to fix it here, let's use ironic API, not direct mysql access. |
Also, I've uploaded a WIP to ironic which deals with this by registering all interfaces at once |
I'm biased since this is based on a workaround I previously proposed, but unless we're confident we can fix this at the ironic API level IMO it's a reasonable temporary workaround. As mentioned the issue is the checks in both BMO and terraform are insufficient as they only check for non-zero drivers and they're not loaded atomically. Instead we either need a way to check that all drivers specified in the config are loaded before attempting deployment (which this does, albeit in a crude way), or some way to check the needed driver is available before attempting to register/deploy a host - I can see that's fairly simple in the BMO case, but I suspect it will be more difficult at the terraform level because of the way the provider checks and host provisioning are decoupled /cc @stbenjam |
This seems like a reasonable workaround to me, everyone on slower hardware or using nested virt hits this. |
Yeah, I think we definitely need a fix here until we get a fix into a version of Ironic that we can use. My only concern was with implementing it with a count vs. names, but I can go along with a count if we think that's sufficient. |
At least it shouldn't be worse than the previous implementation (in BMO, where the provisioner could have been considered ready even thought the required driver wasn't loaded) |
I see we now have a BMO patch to deal with this metal3-io/baremetal-operator#755 I'll take the ironic patch out of WIP and we can decide in either using it or the BMO patch, in the mean time we can merge this if we want something thats ready now. |
That's my expectation, but I've not yet checked it |
Right, that is what @hardys was getting at. We need a change to the ironic terraform provider to do this in the installer. I am not sure we can know when we are waiting for the API what drivers the user will need, it probably requires digging around in internal data structures. I do prefer the solution in this PR. |
/test-integration |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekhiggins, dhellmann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test-integration |
/test-integration ironic seems to have done its thing correctly |
/test-integration |
/lgtm |
/unhold |
Bug 2023765: Compare IPs using the short form of IPv6 address
Currently the API will start even if the conductor is still
starting up, which can lead to failures if you try to deploy
nodes as soon as the API is accessible.
Co-Author: Steven Hardy shardy@redhat.com
This is a rework of a old PR to deal with this #122
but instead of looking at the drivers in the conductors table we look at hardware_types in conductor_hardware_interfaces
A change in the way ironic registers drivers is proposed here (this would also deal with the issue)
https://review.opendev.org/c/openstack/ironic/+/764911